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Probability  models  applied  by  decision  makers  in  a  wide  variety 
of  contents  must  be  able  to  provide  inferences  under  conditions  of  change. 
A  stochastic  process  whose  probabilistic  properties  change  through  time 
can  be  described  as  a  nonstationary  process.  In  tViis  dissertation  a  model 
involving  normal  and  lognormal  processes  is  developed  for  handling  a  par- 
ticular form  of  nonstationarity  within  a  Bayesian  framework.  Two  uncer- 
tainty conditions  are  considered;  in  one  the  location  parameter,  y ,  is 
assumed  to  be  unknown  and  the  spread  parameter,  a,  is  assumed  to  be  known; 
and  in  the  other  both  parameters  are  assumed  to  be  unknown.  Comparing  the 
nonstationary  model  with  the  stationary  one  it  is  shown  that: 

—  1.  more  uncertainty  (of  a  particular  definition)  is  present 
under  nonstationarity  than  under  stat ionarity ; 

2.  since  the  variance  of  a  lognormal  distribution,  V(x) ,  is  a 
function  of  \i   and  o"^  ,  nonstationarity  in  P  means  that  both  mean  and  vari- 
ance of  the  random  variable,  x,  are  nonstationary  so  that  the  lognormal 

xi 


case  provides  a  generalization  of  the  normal  results; 
and 

3.  as  additional  observations  are  collected  uncertainty  about 
stochastically-varying  parameters  is  never  entirely  eliminated. 

The  asymptotic  behavior  of  the  model  has  important  implications 
for  the  decision  maker.  An  implication  of  the  stationary  Bayesian  model 
for  normal  and  lognormal  processes  is  that  as  additional  observations  are 
collected,  parameter  uncertainty  is  reduced  and  (in  the  limit)  eliminated 
altogether.  In  contrast,  for  the  nonstationary  model  considered  in  this 
dissertation  the  following  inferential  results  are  obtained: 

1.  for  the  case  of  lognormal  or  normal  model,  a  particular  form 
of  stochastic  parameter  variation  implies  a  treatment  of  data  involving 
the  use  of  all  observations  in  a  differential  weighting  scheme; 

and 

2.  random  parameter  variation  produces  important  differences  in 
the  limiting  behavior  of  the  prior  and  predictive  distributions  since 
under  nonstationnrity  the  limiting  values  of  the  parameters  of  the  poste- 
rior and  predictive  distributions  cannot  be  determined  clearly. 

Practical  implications  of  the  results  for  the  areas  of  Cost- 
Volume-Profit  Analysis  and  life  testing  are  discussed  with  emphasis  on 
the  predictive  distribution  for  the  outcome  of  a  future  observation  from 
the  data  generating  process.  It  is  emphasized  that  a  Cost-Volume-Profit 
(CVP)  and  life  testing  model  ideally  should  include  the  changing  charac- 
ter of  the  process  by  allowing  for  changes  in  the  parametric   description 

of  the  process  through  Lime.  Failure  to  recognize  nonstationarity  when 
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it  is  present  has  a  number  of  imp!  icnt ions  in  the  CVP  and  life-testing 
contexts  that  are  explored  in  tlie  dissertation.  For  example,  inferences 
are  improperly  obtained  if  the  nonstationarity  is  ignored,  and  prediction 
interval  coverage  probabilities  are  overstated  since  uncertainty  is 
greater  (in  a  particular  sense)  when  nonstationarity  is  present. 
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CHAPTER  ONE 

TNTRODUCTTON 
1 . 1  Introduction 

Uncertainty  is  an  essential  and  intrinsic  part  of  the  human 
condition.  The  opinions  we  express,  the  conclusions  we  reach  and  the 
decisions  we  make  are  often  based  on  beliefs  concerning  the  probability 
of  uncertain  events  such  as  the  result  of  an  experiment,  the  future  value 
of  an  investment  or  the  number  of  units  to  be  sold  next  year.  If  manage- 
ment, for  instance,  were  certain  about  what  circumstances  would  exist  at 
a  given  time,  the  preparation  of  a  forecast  would  be  a  trivial  matter. 
Virtually  all  situations  faced  by  management  involve  uncertainty,  however, 
and  judgments  must  be  made  and  information  must  be  gathered  to  reduce 
this  uncertainty  and  its  effects.  One  of  the  functions  of  applied  mathe- 
matics is  to  provide  information  which  may  be  used  in  making  decisions 
or  forming  judgments  about  unknown  quantities. 

Several  early  studies  by  econometricians  and  statisticians 
examined  the  problem  of  constructing  a  model. whose  output  is  as 
close  as  possible  to  the  observed  data  from  the  real  system  and  which 
reflects  all  the  uncertainty  that  tlie  decision  maker  has.  Mathematical 
models  for  statistical  problems,  for  instance,  have  some  element  of  un- 
certainty incorporated  in  the  form  of  a  probability  measure.  The  model 
usually  involves  the  formulation  of  a  probability  distribution  of  the 
uncertain  quantities.  This  element  of  uncertainty  is  carried  through 


2 
the  analysis  to  the  inferences  drawn.  The  equations  that  form  the  mathe- 
matica]  model  are  usually  specified  to  within  a  number  of  parameters 
or  coefficients  which  must  be  estimated.  The  unknown  parameters  are 
usually  assumed  to  be  constant  and  the  problem  of  model  identification 
is  reduced  to  one  of  constant  parameter  estimation. 

There  are  several  reasons  for  suspecting  that  the  parameters 
of  many  models  constructed  by  engineers  and  econometricians  are  not 
constant  but  in  fact  time-varying.  For  instance,  it  has  become  increas- 
ingly clear  that  to  assume  that  behavioral  and  technological  relationships 
are  stable  over  time  is, in  many  cases, completly  untenable  on  the  basis 
of  economic  theory.  Several  recent  studies  provide  support  for  the  claim 
that  the  parameters  of  distributions  of  stock-price-related  variables  may 
change  over  time  [see  Barry  and  Winkler  (1976)].  In  engineering,  particu- 
larly in  reliability  theory,  the  origins  of  parameter  variation  are  usually 
not  very  hard  to  pinpoint.  Component  wear,  variation  in  inputs  or  compo- 
nent failure  are  some  very  common  reasons  for  parameter  variations.  The 
major  objective  of  construction  of  engineering  models  is  control  and  regu- 
lation of  the  real  system  modeled.  Therefore,  much  of  the  research  in 
that  area  has  concentrated  on  devising  ways  to  make  the  output  of  the 
model  insensitive  to  parameter  variation.  Simil arly,  in  forecasting  models 
for  economic  variables,  researchers  have  had  great  concern  with  time  varying 
parameters  of  the  distributions  of  interest.  In  this  area  the  problem  of 
varying  parameters  has  received  increased  attention  because  there  is 
increasing  evidence  that  the  common  regression  assumption  of  stable 


liarametiirs  often  appears  invalid. 

Ln  tliis  (I  issertat  inn  we  plan  to  study  a  particular  type  of  random 
parameter  variation  whi.th  is  likely  to  be  applicable  when  nonstat  ionar  ity 
over  time  is  present.  The  modeling  of  nonstationarity  that  we  are  going  to 
present  assumes  that  successive  values  in  time  of  the  unknown  parameter 
are  related  in  a  stochastic  manner;  i.e.,  the  parameter  variation  includes 
a  component  which  is  a  realization  of  some  random  process.  For  purposes 
of  estimation  we  are  interested  in  specific  realizations  of  the  random 
process.  When  the  process  generating  the  unknown  parameter  is  a  nonsta- 
tionary  process  over  time  tlie  decision  maker  should  be  concerned  with 
a  sequence  of  values  of  the  parameter  instead  of  a  single  value  as  in 
the  usual  stationary  model;  i.e.,  inferences  and  decisions  concerning  the 
parameter  should  reflect  the  fact  that  it  is  changing  over  time. 

If  tlie  values  of  an  unknown  parameter  over  time  are  related 
in  a  stochastic  manner,  a  formal  analysis  of  the  situation  requires 
some  assumptions  about  the  stochastic  relationsliip .  For  the  model  of 
nonstationarity  tliat  we  develop  in  this  dissertation,  the  specification 
of  the  stochastic  relationship  between  values  of  the  parameter  is  suf- 
ficient. Moreover  it  is  assumed  that  this  relationship  is  stationary 
(usually  referred  to  as  second-order  stat ionar ity)  in  the  sense  that  the 
stochastic  relat ionsliij)  is  the  same  for  any  pair  of  consecutive  values 
of  the  unknovjn  parameter. 

We  \\'anr  to  gaii^i  more  precise  information  about  tlie  structure 
of  the  t  iiiu'-vary  ing  parameters  and  to  obtain  estimated  relationships 


that  are  suitable  for  forecasting.  The  model  to  be  developed  makes  it 
possible  to  dra^j  inferences  about  the  structure  of  the  relationship  at 
every  point  in  time.  There  are  problems  in  accounting,  life  testing  theory, 
finance  and  a  variety  of  other  areas  that  can  benefit  from  nonstationary 
parameter  estimation  techniques. 

1 . 2  Summary  of  Results  and  Overview  of  Dissertation 
The  goals  of  this  dissertation  are  to  develop  a  rigorous  model 
for  handling  nonstationarity  within  a  Bayesian  framework,  to  compare 
inferences  from  stationary  and  nonstationary  models,  and  to  investigate 
inferential  applications  in  the  areas  of  Cost-Volxime-Prof it  Analysis  and 
life  testing  models  involving  nonstationarity.  Probably  the  most  important 
advantage  of  the  new  work  to  be  presented  in  this  dissertation  is  the 
Increased  versatility  it  adds  to  the  nonstationary  Bayesian  model  derived 
by  Winkler  and  Barry  (1973).  The  new  results  enlarge  the  range  of  real  and 
important  problems  involving  univariate  and  multivariate  nonstationary 
normal  and  lognormal  processes  which  can  be  handled.  Another  advantage 
is  the  simplicity  of  the  updating  methods  for  the  efficient  handling  of 
the  estimation  of  unknown  parameters  and  the  prediction  of  the  outcome 
of  a  future  sample. 

A  survey  of  the  most  relevant  literature  is  provided  in  Chapter 
Two  to  set  the  stage  for  the  new  developments  in  the  remainder  of  the  dis- 
sertation. In  tliis  survey  we  present  an  overview  of  probabilistic  Cost- 
Volume-Profit  (CVP)  Analysis  and  discuss  the  most  important  articles 
that  deal  with  CVP  under  conditions  of  uncertainty.  The  review  of  the 


literature  includes  a  section  on    life  testing;  models  eniptiasizing  the  use 
of  Bayesian  tec.hnitiues  used  in  life  testing.  It  is  empliaslzed  that  most 
of  the  research  done  in  these  two  areas  neglects  the  problem  of  nonsta- 
tionarity.  A  special  section  Is  presented  to  discuss  some  important 
articles  about  modeling  nonstat ionary  processes. 

As  is  mentioned  in  Chapter  Two,  most  research  concerned  with 
the  normal  and  lognormal  distributions  has  considered  cmly  stationary 
situations.  That  is,  the  parameters  and  distributions  used  are  assumed 
to  remain  tlie  same  in  all  periods.  In  Cliapter  Tliree  we  develop  a  Bayesian 
model  of  nonstat ionari ty  for  normal  and  lognormal  processes,  In  it  we 
describe  essential  features  of  the  Bayesian  analysis  of  normal  and  log- 
normal  processes  under  nonstat ionari ty ,  like  the  prior,  posterior  and 
predictive  dist  rihut  icins .  Two  uncertainty  conditions  are  considered  in 
this  chapter;  in  one  the  loc^ition  parameter,  y ,  is  assumed  to  be  unknovm 
and  the  spread  parameter,  a,    is  assumed  to  be  known;  and  in  the  other, 
both  parameters  are  assumed  to  be  unknown.  Comparing  the  nonstat ionary 
model  with  the  stationary  one  It  Is  shown  that: 

1.  more  uncertainty  (of  a  particular  definition)  is  present 
under  nonstat ionari ty  than  under  stat  lonarlty; 

2.  since  the  variance  of  a  lognormal  distribution,  V(x),  is  a 
function  of  p  and  u^,  nonstat  iiinarity  in  p  means  that  both  mean  and  vari- 
ance of  the  random  variable,  x,  are  nonstat ionary ,  so  that  the  lognormal 
case  provides  a  generalization  of  the  normal  results; 
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and , 

3.  that,  as  adiiitiona]  observations  are  collected,  uncertainty  about 
stochastically-varying  parameters  is  never  entirely  eliminated. 

The  results  discussed  in  Chapter  Three  have  to  do  with  the  period- 
to-period  effects  of  random  parameter  variation  upon  the  posterior  and  pre- 
dictive distributions.  However,  the  asymptotic  behavior  of  tlie  model  has 
important  implications  for  the  decision  maker.  An  implication  of  the  sta- 
tionary Bayesian  model  for  normal  and  lognormal  processes  is  that  as  addi- 
tional observations  are  collected  parameter  uncertainty  is  reduced  and 
(in  the  limit)  eliminated  altogether.  Such  an  implication  is  inconsistent 
with  observed  real  \;orld  behavior  largely  because  the  conditions  under 
which  inferences  are  made  typically  change  across  time.  The  common  dictiam 
[see  Dickinson  (1974)]  has  been  to  eliminate  some  observations  in  the  case 
of  changing  parameters  so  that  only  those  most  recent  observations  are 
considered.  In  Chapter  Four  we  show  that: 

1.  for  the  case  of  a  lognormal  or  normal  model,  a  particular 
form  of  stochastic  parameter  variation  implies  a  treatment  of  data 
involving  the  use  of  all  observations  in  a  differential  weighting  scheme, 
and, 

2,  random  parameter  variation  produces  important  differences 

in  the  limiting  behavior  of  the  prior  and  predictive  distributions  since 
under  nonstationar i ty  the!  limiting  values  of  some  of  the  parameters  of  the 
posterior  and  predictive  distributions  can  not  be  determined  clearly. 


One  objecaive  of  tliis  dissertation  is  to  develop  Bayesian  pre- 
diction intervals  for  future  observations  that  come  from  normal  and  log- 
normal  data  generating  processes.  In  Chapter  Four  we  address  the  problem 
of  constructing  prediction  intervals  for  normal,  Student,  lognormal  and 
logStudent  distributions.  It  is  pointed  out  tliat  it  is  easy  to  construct 
these  intervals  for  the  normal  and  Student  distributions  but  that  it  is 
rather  difficult  for  the  lognormal  and  logStudent  distributions.  An 
algorithm  is  presented  to  compute  the  Bayesian  prediction  intervals  for 
the  lognormal  and  logStudent  distributions.  Bayesian  prediction  intervals 
under  nonstat ionar i ty  are  compared  with  classical,  certainty  equivalent 
and  Bayesian  stationary  intervals. 

In  Chapter  Five  we  discuss  the  application  of  the  results  of 
Ciiapters  Tliree  and  Four  concerning  nonstat ionarity  to  the  area  of  CVP 
analysis  and  life  testing  models.  Practical  implications  of  our  results 
for  these  two  areas  are  discussed  with  emphasis  on  the  predictive  dis- 
tribution for  the  outcome  of  a  future  observation  from  the  data  generating 
process.  It  is  emj^hasized  that  CVP  and  life  testing  models  ideally 
should  include  tlie  clianging  character  of  the  process  by  allowing  for 
changes  in  the  parametric  description  of  the  process  through  time.  It 
is  shown  that,  for  the  case  of  normal  ami  lognormal  data  generating 
processes  under  a  partii-ular  form  of  stochastic  parameter  variation,  the 
presence  of  nonstat ionar i ty  produces  greater  uncertainty  to  tlie  decision 
maker.  Nonstat  iona  r  i.ty  implies  greater  uncertainty,  whicli  is  reflected 
by  an  increase  in  the  iiredictive  variance  of  profits  for  CVP  models, 
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by  an  increase  in  tlie  predictive  variance  of  life  length  for  life  testing 
models,  and  by  an  increase  in  the  width  of  intervals  required  to  contain 
particular  coverage  probabilities. 

Chapter  Six  provides  conclusions,  limitations  and  suggestions 
for  further  research.  Since  stationarlty  assumptions  are  often  quite 
unrealistic;  it  is  concluded  in  that  chapter  that  the  introduction  of 
possible  nonstationarity  greatly  increases  the  realism  and  applicability 
of  statistical  inference  methods,  in  particular  of  Bayesian  procedures. 


CHAPTER  TWO 

SURVliY  OF  PERTINENT  LITERATURE 
The  primary  purpose  of  the  research  in  this  dissertation  is 
to  present  a  Bayesian  model  of  nonstationarity  in  normal  and  lo^normal 
processes  witli  ajipl  ications  in  Cost-Volume-Profit  analysis  and  life 
testing  models.  A  survey  of  the  most  relevant  literature  is  provided 
in  the  cliapter  and  will  serve  to  set  the  stage  for  the  new  developments 
in  the  remainder  of  the  thesis. 

In  tills  survey,  three  areas  are  covered.  In  Section  2.1  we  pre- 
sent an  overvlev\?  of  probal)ilistic  Cost-Volume-Profit  (CVP)  analysis  and 
discuss  the  most  important  articles  that  deal  v;lth  CVP  under  conditions 
of  uncertainty.  In  Section  2.2  we  discuss  life  testing  models  V\7ith  an 
emphasis  on  the  exi^onentlal ,  gamma,  Weibull  and  lognormal  models.  The 
review  of  tlie  literature  includes  a  special  section  on  Bayesian  techniques 
used  in  life  testing.  Finally  in  Section  2.3  a  survey  is  presented  of 
some  important  articles  about  modeling  nonstationary  processes. 

2 . 1  Cost-Volume-Profit  (CVP)-  Analysis 
Management  requires  realistic  and  accurate  information  to 
aid  in  decision  making.  Cost-Vo] unie-Prof i t  (CVP)  analysis  is  a  widely 
accepted  generator  of  information  useful  in  decision  making  proces- 
ses. CVP  analysis  essentially  consists  in  examining  the  relationship 
between  changes  in  volume  (  output  )  and  changes  in  profit.  The  funda- 
mental ass\imption  in  all  types  of  CVP  decisions  is  that  the  firm,  or 
a  department,  or  other  Lvpe  of  costing  unit,  pt)ssesses  a  fixed  set 
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of  resources  tliat  comniits  tlie  firm  to  a  certain  level  of  fixed  costs 

for  at  least  a  slinrtrun  periotl.  I'he  decision  problem  facing  a  manager 
is  to  determine  the  most  efficient  and  productive  use  of  this  fixed 
set  of  resources  relative  to  output  levels  and  output  mixes.  The  scope 
of  CVP  analysis  ranges  from  determination  of  the  optimal  output  level 
for  a  single-product  department  to  the  determination  of  optimal  output 
mix  of  a  large  multi-product  firm.  All  these  decisions  rely  on  simple 
relationships  between  changes  in  revenues  and  costs  and  changes  in 
output  levels  or  mixes.  All  CVP  analyses  are  characterized  by  their 
emphasis  on  cost  a\id  revenue  behavior  over  various  ranges  of  output 
levels  and  mixes. 

The  determination  of  the  selling  price  of  a  product  is  a 
complex  matter  that  is  often  affected  by  forces  partially  or  entirely 
beyond  the  control  of  management.  Nevertlieless,  management  must  formu- 
late pricing  policies  williin  the  bounds  permitted  by  the  market  place. 
Accounting  can  play  an  important  role  in  the  development  of  policy 
by  supplying  management  with  special  reports  on  the  relative  profit- 
ability of  its  various  products,  the  probable  effects  of  contemplated 
changes  in  selling  price  and  otlier  CVP  relationships. 

The  unit  cost  of  producing  a  commodity  is  affected  by  such 
factors  as  the  iniierent  nature  of  tlie  product,  the  efficiency  of  oper- 
ations, and  the  volume  of  production.  An  increase  in  the  quantity 
produced  is  ortlinaiily  accompanied  by  a  decrease  in  unit  cost,  pro- 
vided tiie  volume  attained  remains  within  tlie  limits  of  plant  capacity. 
Quantitative  data  relating  to  the  effect  on  income  of  changes  in 
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unit  selling  price,  sales  volume,  production  volume,  production  costs, 
and  operating  expenses  help  management  to  improve  the  relationships 
among  these  variables.  If  a  change  in  selling  price  appears  to  be  de- 
sirable or,  because  of  competitive  pressure,  unavoidable,  the  possible 
effect  of  the  diange  on  sales  volume  and  prod\ict  cost  needs  to  be 
considered. 

A  mathematical  expression  of  the  profit  equation  of  CVP 
analysis  is: 

(2.1.1)      Z  =  Q  (P-V)  -  F, 
where  Z  =  total  profits, 

Q  =  sales  volume  in  units, 
1'  =  unit  selling  price, 
V  =  unit  variable  cost, 
and  F  =  total  fixed  costs. 

Tills  accounting  model  of  analysis  has  been  traditionally 
used  by  the  management  accountant  in  profit  planning.  This  use,  how- 
wver,  typically  ignores  the  uncertainty  associated  with  the  firm's  oper- 
ation, thus  severely  limiting  its  applicability.  During  the  past  12 
years,  accountants  have  attempted  to  resolve  this  problem  by  intro- 
ducing stochastic  aspects  into  the  analysis. 

The  applicability  of  probabilistic  models  for  this  analysis 
has  been  claimed  because  of  the  realism  of  such  models,  i.e.,  deci- 
sions are  always  accompanied  by  uncertainty.  Thus,  the  ideal  model 
is  one  that  gives  a  probability  distribution  of  tlie  criterion  variable, 
profit,  and  that  fully  recognizes  the  uncertainty  faced  by  the  firm. 
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The  realism  of  sueli  a  nu)del  is  dependent  on  logical  assumptions  for 
the  input  variables  and  rigorous  methodology  in  obtaining  the  output 
distribution.  Further,  we  hope  that, the  model  can  accomodate  a  wide 
range  of  uses.  For  example,  the  capability  to  handle  dependence  among 
input  variables  adds  a  highly  useful  dimension. 

Jaedicke  and  Robichek  (196A)  first  introduced  risk  into  the 
model.  They  assum.ed  the  follovying  relation  among  the  means 

(2.1.2)        F(Z)  =  E(Q)  [E(P)  -  E(V)1  -  E(F)  , 

where  E(  •)  denotes  mathematical  expectation. 

In  addition   they  assumed  that  the  key  variables  were  all  normally 
distributed  and  tliat  the  resulting  profit  is  also  normally  distributed. 
Thus,  by  computing  the  r^iean  value  and  standard  deviation  of  the  re- 
sulting profit  function,  various  probabilistic  measures  of  profit 
can  be  obtained.  This  model  has  been  depicted  as  a  limit  analysis, 
since  the  assumptions  of  the  independent  model  parameters  and  tlie 
normalcy  of  the  resulting  profit  fimction  are  not  true  except  in 
limiting  cases.  According  to  Ferrara,  Hayya  and  Nachman  (1972),  the 
product  of  two  normally  and  independently  distributed  variables  will 
approximate  normality  if  the  sum  of  the  two  coefficients  of  variation 
is  less  than  or  equal  to  .12. 

Others  have  confronted  the  same  problem  of  how  to  identify 
the  resulting  profit  distribution  when  it  is  not  close  to  a  normal 
distribution.  They  have  noted  that  it  is  often  difficult  to  obtain 
analytical,  expressions  for  the  product  of  random  variables.  Because 
the  ap[)ropiate  d  i :: t ri buL  i onal  forms  for  tlie  product  of  the  variable 
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tunc  I:  ions  may  not  In?  known,  Bu/.by  (1974)  suggests  the  application  of 

Tchebychef f ' s  theorem  to  stochastic  Cost-Volume-Profit  analysis.  This 
theorem,  ho\v;ever,  permits  the  analyst  to  derive  only  some   very  crude 
bounds  on  the  probabilities  of  interest,  so  its  value  as  a  decision- 
making tool  is  limited.  Liao  (1975)  illustrated  liow  m.odel  sampling 
(also  called  distribution  sampling)  coupled  with  a  curve-fitting 
technique  can  be  used  to  overcome  the  above  problems  associated 
with  stochastic  CVP  analysis.  In  his  paper,  the  illustration  of  the 
proposed  approach  to  stochastic  CVP  analysis  is  first  developed  through 
a  consideration  to  the  Jaedicke-Robicheck  problem,  wherein   the  model 
parameters  are  independent  and  normally  distributed.  After  that,  the 
Illustration  problem  is  modified  to  accomodate  dependent  and  non-normal 
variates  in  the  problem. 

Milliard  and  Leitch  (1975)  developed  a  model  for  CVP  analysis 
assuming  a  more  tractable  distribution  for  the  inputs  of  the  equation. 
It  allows  fur  dependent  relationships  and  permits  a  rigorous  deriva- 
tion of  the  distribution  of  profit.  The  problems  of  assuming  price  and 
quantity  to  be  independent  are  pointed  out.  The  authors  also  pointed 
out  that  assuming  sales  to  be  normally  distributed  implies  a  positive 
probability  of  negative  sales. 

Probabilities  and  tolerance  intervals  for  the  Hilliard  and 
Leitch  model  are  obtained  from  tables  of  the  normal  distribution. 
The  only  assumptions  i-equired  for  the  model  are  (1)  quantity  and 
contr il)ut ion  margin  are  lognormally  distributed  random  variables 
and  (2)  fixed  costs  are  deterministic.  The  assumption  that  sales 
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quantity  and  contribution  margin  are  bivariate  loRnormally  distributed 
eliminates  the  possibility  of  negative  sales  and  of  selling  prices 
below  variable  costs,  and  it  has  the  nice  additional  property  that  the 
product  of  two  bivariate  iognormal  random  variables  is  also  lognormal. 
Thus,  \^Ie   can  allow  for  uncertainty  in  price  and  quantity  and  still 
have  a  closed  form  expression  for  the  probability  distribution  of 
gross  profits.  Hilliard  and  Leitch  can  not  assume  that  price  and 
varial)le  costs  are  marginally  lognormally  distributed  and  have  contri- 
bution margin  also  be  lognormally  distributed.  Similarly,  if  fixed 
costs  are  assumed  to  be  lognormally  distributed  too,  net  profits  will 
not  be  lognormally  distributed. 

Adar,  Barnea  and  Lev  (1977)  presented  a  model  for  CVI^ 
analysis  under  uncertainty  tiiat  combines  the  probability  characteristics 
of  the  environment  variables  with  the  risk  preferences  of  decision 
makers.  The  approach  is  based  on  recently  suggested  economic  models 
of  the  firm's  optimal  output  decision  under  uncertainty,  which  were 
modified  within  tlu'  mean-standard  deviation  framework  to  provide  for 
a  cost-volume-ut ii  i ty  analysis  allov.'ing  management  to:  (1)  determine 
optimal  output,  (2)  consider  the  desirability  of  alternative  plans 
involving  changes  in  fixed  and  variable  costs,  expected  price  and 
uncertainty  of  price  and  technology  changes  and  (3)  determine  the 
economic  consequences  of  fixed  cost  variances. 

Dickinson  (1974)  addresses  tlie  problem  of  CVP  analysis  under 
uncertainty  by  exaniining  the  relialiility  of  using  the  usual  methods 
of  estimating  the  means  .ind  variances  of  the  past  distributions  of 
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sales  demand,  lie  emphasized  tliat,  wliea  the  expectation  and  variance 
of  profits  are  estimated  from  past  data,  it  is  important  to  differen- 
tiate between  what,  in  fact,  are  estimated  and  what  are  true  values 
of  the  parameters.  In  other  words,  he  pointed  out  that  the  estimated 
expectation  of  profits,  I'Uti  )  ,  reflects  estimation  risk  and  is  not 
equal  to  E(n )  .  Classical  confidence  intervals  v^?ere  used  for  tlie 
expected  value  of  profits,  E(-n)  ,  for  the  variance  of  profits,  Var(7r)i 
and  for  probabilities  of  various  profit  levels.  However,  Dickinson 
misinterpreted  the  classical  confidence  intervals  that  he  obtains  in 
his  paper.  When  a  classicist  constructs  a  90  percent  confidence  interval 
for  ]s,    for  example,  he  would  state  that  in  the  long  run,  90  percent 
of  all  such  intervals  v%?i  1 1  contain  the  true  value  of  p.  Tlie  classical 
statement  is  based  on  long-run  frequency  considerations.  The  classicist 
is  absolutely  opposed  to  the  interpretation  that  tlie  90  percent  refers 
to  the  probability  tliat  tlie  true  universe  mean  lies  within  the  specified 
interval.  In  the  eyes  of  a  classicist,  a  unique  true  value  exists  for 
the  universe  mean,  and  therefore  the  value  of  the  universe  mean  can- 
not be  treated  as  a  random  variable.  Dickinson's  paper  also  illus- 
trates the  difficulty  of  obtaining  the  probability  statements  of 
greatest  interest  to  management  in  a  classical  approach.  His  analysis 
is  only  able  to  provide  confidence  intervals  of  probabilities  of 
profit  levels  rather  tlian  the  profit  level  probabilities  themselves. 
The  probliMii  of  parameter  uncertainty  has  been  neglected  by 
the  people  that  have  studied  CVP  analysis  under  uncertainty.  In  the 
liayesi<ni  approacli.  uhclt i  a  inty  regarding  the  parameters  of  probability 
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models  is  reflected  in  prior  and  posterior  probability  statements 

regarding  the  parameters.  Marginal  distributions  of  variables  which 
depend  on  tliose  |iarameters  may  be  obtained  by  integrating  out  the 
distribution  of  the  parameters,  thereby  obtaining  predictive  distri- 
butions [see  Roberts  (1965)  and  Zellner  (1971)]  of  the  quantities  of 
interest  to  the  manager.  These  predictive  distributions  permit  one 
to  make  valid  proliability  statements  regarding  the  important  quan- 
tities, such  as  profits. 

Nonstat ionarity  is  another  important  aspect  related  to  CVP 
analysis  that  no  one  has  considered.  In  a  world  that  is  continually 
changing,  it  is  important  to  recognize  that  the  parameters  that 
describe  a  process  at  a  particular  point  in  time  may  not  do  so  at 
a  later  point  in  time.  In  the  case  of  the  variable  siiles,  for  instance, 
experience   shows  that  it  is  typically  affected  by  a  variety  of  eco- 
nomic and  political  events.  Thus,  a  CVP  model  ideally  should  include 
the  changing  character  of  the  process  by  allowing  for  changes  in  the 
parametric  description  of  the  process  through  time.  Failure  to  recog- 
nize the  nonstationary  conditions  may  result  in  misleading  inferences. 

lii  this  dissertation  the  problem  of-  Cost-Volume-Profit  analy- 
sis Vv/ill  be  considered  from  a  Bayesian  viewpoint,  and  inferences  under 
a  special  case  of  nonstationarity  will  be  considered.  Also  the  Bayesian 
results  under  nonstationarity  will  be  compared  with  tliose  results 
that  can  be  obtained  under  a  stationary  Bayesian  model,  and  the  Baye- 
sian model  will  1m>  c:ompared  with  some  alternative  approaches. 
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2.2  Life  Testing  Models 


2.2.1  Introduction 


The  development  of  recent  technology  has  given  special  impor- 
tance to  several  problems  concerning  the  improvement  of  the  effective- 
ness of  devices  of  various  kinds.  It  is  often  important  to  impose 
extraordinarily  high  standards  on  tlie  performance  of  these  devices, 
since  a  failure  in  the  performance  could  bring  disastrous  consequences. 
The  quality  of  production  plays  an  important  role  in  today's  life.  An 
interruption  in  the  operation  of  a  regulating  device  can  lead  not 
only  to  deterioration  in  the  quality  of  a  manufactured  product  but 
also  to  damage  of  the  industrial  process.  From  a  purely  economic  view- 
point high  reliability  is  desirable  to  reduce  costs.  However,  since 
it  is  costly  to  achieve  high  reliability,  there  is  a  tradeoff.  The 
failure  of  a  part  or  conijionent  results  not  only  in  the  loss  of  the 
failed  item  but  often  results  in  the  loss  (at  least  temporarily)  of 
some  larger  assembly  or  system  of  which  it  is  part.  There  are  nu- 
merous examples  in  wiiioh  failures  of  components  have  caused  losses 
of  millions  of  dollars   and  personal  losses.  The  space  program  is  an 
excellent  example  where  even  the  lives  of  some  astronauts  were  lost 
due  to  failure  in  the  system.  The  follo^^;ing  authors  have  considered 
the  statistical  theory  of  reliability  and  provide  a  good  set  of  re- 
ferences on  the  subject:  Mendenhall  (1958),  Buckland  (1960),  Birnbaum 
(1962),  Covind.ira  i  II 1  u  (IMtjA),  Mann,  Scliaefer  and  Singpurwaila  (1973), 
and  Canfield  and  liorgiiian  (1975). 

Ki_' 1  iabil  i  I  y  tlieory  is  the  disc  ii^l  inc'  tliat  deals  with  procediires 
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to  ensure  tlie  maximum  effectiveness  of  manufactured  articles 
and  tliat  develops  methods  of  evaluating  the  quality  of  systems  from 
knovm  cjualities  of  their  component  parts.  A  large  num.ber  of  problems 
in  reliability  theory  have  a  mathematical  character  and  require  the 
use  of  mathematical  tools  and  the  development  of  new  ones  for  their 
solution.  Areas  like  probability  theory  and  mathematical  statistics 
are  necessary  to  solve  some  of  the  problems  found  in  reliability 
theory.  No  matter  how  liard  the  company  works  to  maintain  constant 
conditions  during  a  production  process,  fluctuations  in  the  production 
factors  lead  to  a  significant  variation  in  the  properties  of  the 
finished  products.  In  add ition,  articles  are  subjected  to  different 
conditions  in  the  course  of  tlieir  use.  To  maintain  and  to  increase 
the  reliability  of  a  system  or  of  an  article  requires  both  material 
expenditures  and  scientific  research. 

Statistical  tlieory  and  methodology  have  played  an  influen- 
tial role  in  the  development  of  reliability  theory  since  the  publi- 
cation of  the  paper  by  Epstein  and  Sobel  (1953).  Four  statistical 
concepts  provide  the  basis  for  estimating  relevant  parameters  and 
testing  hypotheses  about  the  life  characteristic  of  the  subject 
matter.  These  concepts  are: 

(1)  the  distribution  function  of  some  variable  which  is  a 
direct  or  indirect  measure  of  the  response  (life  time)  to  usage  in 
a  particular  euvirnnment; 

(ii)  tlie  associated  probability  density  (or  frequency) 
function  : 
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(ill)  the  survival  probability  function;  and 

(iv)   the  conditional  failure  rate. 

A  failure  distribution  provides  a  mathematical  description 
of  the  length  of  life  of  a  device,  structure  or  material.  Consider 
a  piece  of  equipment  which  has  been  in  a  given  environment,  e.  Tlie 
fatigue  life  of  this  piece  of  equipment  is  defined  to  be  the  length 
of  time,  T(e),  this  piece  of  equipment  operates  before  it  fails.  Full 
information  about  e  would  fully  determine  T(6)  ,  so  that  given  e,  T(e) 
would  not  be  random.  One  source  of  randomness  in  life  is  in  uncertainty 
about  the  environment,  i.e.,  T(e)  is  a  random  variable  because  e  is 
random.  Equipment  has  different  survival  characteristics  depending  nn 
the  conditions  under  which  it  is  operated,  and  e  provides  a  statement 
of  what  conditions  are  but  does  not  determine  T(e)  fully. 

The  reliability  of  an  operating  system  is  defined  as  the 
probability  that  the  system  will  perform  satisfactorily  within 
specified  conditions  over  a  given  future  time  period  when  the  system 
starts  operating  at  some  time  origin.  Different  distributions  can 
be  distinguished  according  to  their  failure  rate  function,  which 
is  known  in  the  literature  of  reliability  as  a  hazard  rate  [see 
Barlow  and  Prosch.m  (1965) ] .  The  hazard  rate  (denoted  by  h) ,  which 
is  a  function  of  time,  gives  the  conditional  density  of  failure  at 
time,  t,  wit]i  the  hypothesis  that  the  unit  has  been  funcitoning  with- 
out failure  up  to  that  point  in  time.  The  conditional  failure  is 
defined  as: 

(2.2.1)      h(t)  =  f(t)/[L  -  F(t)]  =  f(t)/R(t)  , 
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where     (2.2.2)      F(t)  =  Prob  (T  <  t)  =  f^    f(t)  ds , 


is  the  probability  that  an  observed  value  of  T  will  be  less  than  or 
equal  to  an  assigned  number  t.  The  reliability  function  (also  called 
the  survival  function)  of  the  random  variable  T  gives  the  probability 
that  T  will  exceed  t  and  is  defined  by 

(2.2.3)  R(t)  =  1  -  F(t)  =  Prob  (T  >  t) . 

The  probability  density  function  of  the  random  variable  T,  f(t), 
0  <  t  <  oo,  is  knovm  as  the  failure  density  function  of  the  device. 
It  can  be  shown  that  the  conditional  failure  rate  and  the  distribu- 
tion function  of  a  random  variable  are  related  by 

(2.2.4)  F(t)  =  1  -  exp[-  f^    h(s)  d(s)]. 

0 

The  causes  of  failure  can  be  categorized  into  three  basic 
types.  It  is  recognized,  however,  that  there  may  be  more  than  one 
contributing  cause  to  a  particular  failure  and  that,  in  some  cases, 
there  may  be  no  completely  clearcut  distinction  between  some  of  the 
causes.  The  three  classes  of  failure  are  infant  mortalities,  or 
early  failures,  random  failures  and  wearout  failures.  The  behavior 
of  the  hazard  rate  as  a  funciton  of  time  is  sometimes  known  as  the 
hazard  function  or  life  characteristic  of  the  system.  For  a  typical 
system  that  may  experience  any  of  the  three  previously  described  types 
of  failure,  the  life  characteristic  will  appear  as  in  Figure  1.  The 
representation  of  the  life  characteristic  has  been  classically  referred 
to  as  the  "bathtub  curve",  wherein  the  three  segments  of  the  curve 
represent  the  three  time  periods  of  initial,  chance  and  wearout  failure, 
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Time 


Figure  1.  Life  characteristics  of  some  systems 


The  initial  failure  period  is  characterized  by  a  high  hazard  rate 
shortly  after  time  x=0   and  a  gradual  reduction  during  the  initial 
period  of  operation.  During  the  chance  failure  period,  the  hazard 
rate  is  constant  and  generally  lower  than  during  the  initial  period. 
The  cause  of  this  failure  is  attributed  to  unusual  and  unpredictable 
environmental  conditions  occuring  during  the  operating  time  of  the 
system  or  of  the  device.  The  hazard  rate  increases  during  the  wearout 
period.  This  failure  is  associated  with  the  gradual  depletion  of  a 
material  or  an  accumulation  of  shocks  and  so  on. 

In  the  following  subsections  ue   will  consider   the  general 


22 

properties  of  some  widely  used  life  distributions,  the  assessment 

and  use  of  those  distributions,  and  the  literature  related  to  Bayesian 

methods  in  life  testing. 

2.2.2  Some  Common  Life  Distributions 

J  .  2  .  2 . 1  The  Exponential  Distribution 

In  the  case  of  a  constant  failure  rate  the  distribution  of 
life  is  exponential.  This  case  has  received  the  most  emphasis  in  the 
literature,  since,  in  spite  of  theoretical  limitations,  it  presents 
attractive  statistical  properties  and  is  highly  tractable.  Data 
arising  from  life  tests  under  laboratory  or  service  conditions  are 
often  found  to  conform  to  the  exponential  distribution. 

An  acceptable  justification  for  the  assumption  of  an  expo- 
nential distribution  to  life  studies  was  initially  presented  by  Davis 
(1952).  More  recently  Barlow  and  Proschan  (1965)  have  advanced  a  mathe- 
matical argument  to  support  the  plausability  of  the  exponential  dis- 
tribution as  the  failure  law  of  complex  equipment.  The  random  variable 
T  has  an  exponential  distribution  if  it  has  a  probability  density 
function  of  the  form 

(2.2.5)     tj,(t)  =   o"^  exp[-(t-0)/o]  ,       t  >  6, 

o   >   0. 

The  mean  and  variance  of  T  are  (o  +  9)  and  a^ ,  respectively.  In  most 
a])pllcat ions  0  is  taken  as  zero.  For  this  distribution,  the  physical 
interpretation  of  a  constant  hazard  function  is  that,  irrespective  of 
the  time  elapsed  since  the  start  of  operation, of  a  system  the  prob- 
ability that  the  system  fail  in  the  next  time  intervals  dt. 
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given  that  it  ha.s  survived  to  time  t,  is  independent  of  the  elapsed 
time  t  and  is  constant. 

2.2.2.2  The  Gamma  Distribution 

An  extremely  useful  distribution  in  fatigue  and  wearout 
studies  is  the  gamma  distribution.  It  also  has  a  very  important  rela- 
tionship to  the  exponential  distribution,  namely,  that  the  sum  of  n 
independent  and  identically  distributed  (i.i.d.)  exponential  random 
variables  with  common  parameters  Q=0  and  a  is  a  random  variable  that 
has  a  gamma  distribution  with  parameters  n  and  o.  Hence,  the  exponen- 
tial distribution  is  a  special  case  of  the  gamma  with  n=l. 

Tiie  random  variable  T  has  a  gamma  distribution  if  its  pro- 
bability density  function  is  of  the  form, 

(2.2.6)  f  (t)  =   t(t-9)"""^  exp[-(t-6)/al}  /a'Y(n);   n  >  0, 

a  >  0, 

e  >  0. 

The  standard  form  of  the  distribtuion  is  obtained  by  puttiiig  o  =  l  and 
6=0,  giving 

(2.2.7)  f^(t)  =   [t"~^  exp(-t)]/r(n),      t>0; 
where  the  gamma  function,  denoted  F,  is  a  mapping  of  the  interval 
(0,°°)  into  itself  and  is  defined  by 

(2.2.8)  r(n)  =   /   t"~^  exp(-t)  dt. 

0 
The  probability  distribution  function  of  (2.2.7)  is 

(2.2.9)  ProbiT  <  t]  =  [r(n)]-l  f^    x""^  exp(-x)  dx  . 
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Since  a  distribution  of  the  form  given  in  equation  (2,2.6) 
can  be  obtained  from  standardized  distributions,  as  in  equation  (2.2.7), 
by  the  linear  transformation  t=(t'-e)/o,  there  is  no  difficulty  in 
deriving  formulas  for  moments,  generating  functions, etc.,  for  equation 
(2.2.6)  from  those  for  equation  (2.2.7). 

One  of  the  most  important  properties  of  the  distribution  is 
the  reproductive  property;  if  T,  and  T   are  independent  random  variables 
each  having  a  distribution  of  the  form  (2.2.7),  possibly  with  different 
values  n',  n"  of  n  but  \jith   common  values  of  a   and  0,  then  (Ti+  T2) 
also  has  a  distribution  of  this  form,  with  the  same  value  of  o  and 
0,  and  with  n  =  n'  +  n" . 

2.2.2.3  The  Meibull  Distribution 

The  WeibuU  distribution  was  developed  by  W.  Weibull  (1951) 
of  Sweden  and  used  for  problems  involving  fatigue  lives  of  materials. 
Three  parameters  are  required  to  uniquely  define  a  particular  Weibull 
distribution.  Those  three  parameters  are  the  scale  parameter  a,  the 
shape  parameter  n  and  the  location  parameter  G. 

A  random  variable  T  has  a  Weibull  distribution  if  there  are 
values  of  the  parameters  n  (>0) ,  a  (>0)  and  0  such   that, 

(2.2.10)  Y  =  [(t-0)/a]" 

has  the  exponential  distribution  with  probability  density  function 

(2.2.11)  f^,(y)  =  exp(-y),         y  >  0. 
The  probability  density  function  of  T  is  given  by 
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(2.2.12)  f^(t)  =  no"^  [(t-e)/a]"~^  exp{ - [ (t-O) /a] } "  ,   t  >  6. 

Tlie  standard  WeihuU  distribution  is  obtained  by  putting  a  =  l  and 
6=0.  The  value  zero  for  0  is  by  far  the  most  frequently  used,  espe- 
cially in  representing  distributions  of  life  times. 

The  Weibull  distribution  has   cumulative  distribution  function 

(2.2.13)  F,|,(t)  =  l-exp{-[(t-e)/a]"}  , 
and  its  mean  and  variance  are 

(2.2.14)  E(t)  =  ar(l  +  H/n]) 

and       (2.2.15)    Var(t)  =   o^i r(l+[ 2/n] )  -  r2(H-[l/n])}  ,  respectively. 
For  the  two  parameter  Weibull  distribution  we  have  that  the  reliability 
and  hazard  function  are 

(2.2.16)  R.jXO  =  exp  [-(t/o)"l 

and 

(2.2.17)  h^,(t)  =  nl"~Vo"  . 

Wien  n=l,  the  hazard  function  is  a  constant.  Thus  the  exponential  dis- 
tribution is  a  special  case  of  the  Weibull  distribution  v^;ith  n=l. 

2.2.2.4  The  Lognormal  Distribution 

The  lognormal  distribution  is  also  a  very  popular  distribution 
in  describing  wearout  failures.  This  model  was  developed  as  a  physical 
or,  more  appropiately  biological,  model  associated  with  the  theory 
of  {iroport  ionate  effects  (see  Aitchison  and  Brown  (1937)  for  a  full 
description  of  the  distribution,  its  properties,  and  its  developments). 
Briefly,  if  a  random  variable  is  supposed  to  represent  the  magnitudes 
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at  succesive  points  of  time  of,  for  example,  a  fatigue  crack  or  the 

growtli  of  biological  organisms  and  the  change  between  any  pairs  of 

succesive  steps  or  stages  is  a  random  proportion  of  the  previous 

size,  then  asymptotically  the  distribution  of  the  random  variable  is 

lognormal  [see  Kapteyn  (1903)1.  This  theoretical  result  imparted  some 

plausibility  to  the  lognorraal  distribution  for  failure  problems.  Let 

t, <  t„<  ...  <  t   be  a  sequence  of  random  variables  that  denote  the 
-1-2         n         ' 

sizes  of  a  fatigue  crack  at  succesive  stages  of  its  growth.  It  is 
assumed  that  the  crack  growth  at  stage  i,  t.-  t._,,  is  randomly 
proportional  to  the  size  of  the  crack,  t .  _-.  and  that  the  item  fails 
v>7hen  the  crack  reaches  t  .  Let  t_j^-  t_^_-,  =  it  .  t .  _..  ,  i=  1 ,  2 ,  .  .  .  ,  n,  where 
TT .  is  a  random  variable.  The  n  .  are  assumed  to  be  independently  dis- 
tributed random  variables  tliat  need  not  have  a  common  distribution 
for  all  i's  when  ii  is  large  but  that  need  to  be  lognormally  distrib- 
uted otherwise.  'ITius, 

TTi  =  (t.-  ^L-p/'^i-l   '   i  =  1,  2,  ...  ,  n  . 

Mann,  Schaefer  and  Slngpurwalla  (1973)  show  tliat  In  t  ,  the  life 
length  of  the  item,  for  large  n,  is  asymptotically  normally  distri- 
buted, and  hence  t   has  a  lognormal  distribution. 
If  there  is  a  number  y  such  that 

(2.2.18)  Z  =  In(t-Y) 

is  normally  di  st  r  i  liuLcd ,  then  the  distribution  of  t  is  said  to  be 
lognormal.  The  d i si r i but i cm  of  t  can  be  defined  by  the  equation, 

(2.2.19)  U  =  -/,  +  6  In(t-y)  , 
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where   U    is    n    unit    noriiKil    variable    and  0,    fi    and  y    are    parameters.    The 
probability   den.sitv    function   of   T    is    defined   by 

(2.2.20)  f^,(t)    =      5[(t-Y)/2^|"^    exp[-{Si+Sln(t-Y)}"/2],    t>v 

An  alternative,  more  fashionable  notation  replaces  Q  and  6  by  the 
expected  value  m  and  standard  deviation  a  of  Z  =  In(t-Y).  The  two 
sets  of  parameters  are  related  by  the  equations, 

(2.2.21)  p  =   -il/6 
and 

(2.2.22)  o  =   6~^ 

so  that  the  distrilnitlon  of  t  can  be  defined  by 

(2.2.23)  U  =  fln(t-Y)  -  p]/a 

and  the  probability  density  function  of  T  by 

(2.2.24)  f^(t)  =  [(t-Y)>^a]-l  exp  [-{ ln(  t-y) -y  }2 /2a2  ]  ,  t>Y  . 

In  many  applications,  y  is  known  (or  assumed)  Lo  be  zero. 
This  iminirtant  case  has  been  given  the  name  two  parameter  lognormal 
distribution.  The  mean  and  variance  of  the  two  parameter  distribution 
are  given  by 

(2.2.25)  mt)  =  exp[y  +  (0^/2)]  , 
and 

(2.2.26)  Var(i)  =  [exp(2p)]  <o(m-1)  , 
wliere  m  =  oxp(o  )  . 
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In   addition,    the   value    t      such    that    Fr(t<t    )    =   a   is    related    to    the 

Ct  -^   Ci 

corresponding  percentile,  U  ,  of  the  unit  normal  distribution  by  the 
relation, 

(2.2.27)     t   =  exp(u  +  U  a) . 

a  a 

Applications  of  the  lognormal  distribution  have  appeared  in 
many  diverse  areas,  e.g.,  environmental  health  [see  Dixon  (1937)  and 
Hill  (1963)],  air  pollution  control  [see  Singpurwalla  (1971,  1972), 
Larsen  (1969)  and  others  like  economics  and  insurance  claims  [see 
Wilson  and  Worcester  (1945)]  application  of  the  distribution  is  not 
only  based  on  empirical  observation,  but  in  some  cases  is  supported 
by  theoretical  arguments. 

For  example,  such  arguments  have  been  made  in  the  distribution 
of  particle  sizes  in  natural  aggregates  and  in  the  closely  related 
distribution  of  dnstl   concentration  in  industrial  atmospheres  [see 
Tomlinson  (1957)  and  Oldham  (1965)].   The  lognormal  distribution  has 
also  been  found  to  be  a  serious  competitor  to  the  Weibull  distribution 
in  representing  life  time  distributions  for  manufactured  products. 
Among  our  references,  Adams  (1962),  Ansley  (1967),  Epstein  (1947, 
1948),  Farewell  and  Prentice  (1977),  Govindara julu  (1977),  Goldthwaite 
(1961),  Gupta  (1962),  Hald  (1952)  and  Nowick  and  Berry  (1961)  refer 
to  this  topic.   Other  applications  in  quality  control  are  described 
by  Ferrell  (1958),  N.orrison  (1958)  and  Rohn  (1959).   Many  of  these 
applications  are  also  referenced  by  Aitchison  and  Brown  (1957), 
Finney  (1941)  and  Gupta  et  al.  (1974). 
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2.2.3  Traditional  Approach  to  Life  Testing  Inferences 

In  life  testing  theory  we  find  a  large  niimher  of  random  quan- 
tities. In  most  cases  we  do  not  know  the  distributions  and  theoretical 
characteristics;  our  aim  is  to  estimate  some  of  these  quantities.  This 
is  usually  accomplished  with  the  aid  of  observations  on  the  random 
variables.  According  to  the  laws  of  large  numbers,  an   "exact"  deter- 
mination of  a  probability,  an  expected  value,  etc.,  would  require  an 
"infinite"  number  of  observations.  Having  samjales  of  finite  size, 
we  can  do  no  more  than  estimate  the  theoretical  values  in  question. 
The  sample  characteristics,  or  statistics,  serve  the  purpose  of  sta- 
tistical estimation.  For  a  good  estimation  of  theoretical  quantities, 
a  fairly  large  sample  is  sometimes  needed.  In  many  practical  situations 
the  following  two  types  of  estimation  problems  arise.  A  certain  quan- 
tity, say  t) ,  which  is,  from  the  statistical  point  of  view,  a  theo- 
retical quantity,  lias  to  be  determined  by  means  of  measurement.   Such  a 
quantity  may  be,  for  example,  the  electrical  resistance  of  a  given 
device,  the  life  of  a  given  product,  etc.  The  result  T  of  the  mea- 
suring procedure  is  a  random  variable  whose  distribution  depends  on 
9  and  perhaps  on  additional  quantities.  That  is,  we  have  to  estimate 

the  parameter  9  out  of  a  sample  T,  ,  T,,  ,  ...  ,  T   taken  on  T.  In  the 

J-    2         n 

other  case,  the  quantity  in  question  is  a  random  variable  itself 
and  in  such  cases  we  are  interested  in  tlie  (theoretical)  average 
value,  or  the  dispersion  of  1',  etc.   This  means  that  we  have  to  es- 
timate the  expected  value  E(T)  or  Var(l'),  and  perhaps  other  (constant) 
quantities  that  caii  be  expressed  with  the  aid  of  the  distribution 
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function  of  T,  like  the  reliability  function.  More  often  for  lifetime 
distributions,  tlie  quantity  of  interest  is  a  distribution  percentile, 
also  knovvn  as  tlie  reliable  life  of  the  item  to  be  tested,  corresponding 
to  some  specified  population  survival  proportion;  or  it  is  the  pop- 
ulation  proportion  surviving  at  least  a  specified  time,  say  S  , 

For  the  classical  statistician, the  unknown  parameter  9  is 
considered  to  be  a  constant.  In  estimating  a  constant  valtie  there 
are  various  aspects  to  consider.  If  we  wish  to  have  an  estimator 
whose  value  can  be  used  instead  of  the  unknown  parameter  in  formulas 
[certainty  equivalent  (CE)  approach],  then  tlie  estimator  should 
have  one  given  value.  In  this  case  we  speak  of  point  estimation.  Rut 
knowing  that  our  estimator  is  subject  to  error,  sometimes  we  would 
like  to  have  some  information  on  tlie  average  deviation  from  the 
value.  In  this  case  we  have  to  construct  an  interval  that  contains 
the  unknovm  parameter,  at  least  with  high  probability,  or  give  a 
measure  of  the  variability  of  the  estimator  (such  as  the  standard 
error  of  the  estimate) .  ^k5st  of  the  literature  about  the  traditional 
approach  to  life  testing  inferences  is  focused  in  two  areas;  one 
relates  to  point  and  interval  estimation  procedures  for  lifetime 
distributions  and  the  other  relates  to  methods  of  testing  statisti- 
cal hypotheses  in  reliability  (known  as  "reliability  demonstration 
tests") . 

The  classical,  approach  to  point  estimation  in  life  testing 
inferences  emptiasizes  that  a  good  estimator  should  have  properties 
like  unbiasedness ,  efficiency,  consistency  and  sufficiency  [see 
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Dubey  (1968),  BartletC  (1937)  and  Weiss  (1961)].  Two  methods,  tlie 
method  of  niomeats  and  iiu'thod  of  maximum  likelihood,  are  frequentl; 
used  to  yield  estimators  with  as  many  as  possible  of  the  previously 
mentioned  properties.  Under  various  sampling  assumptions,  the  maxi- 
mum likeliliood  estimators  of  the  parameters  Vv^ere  obtained  for  the 
following  distributions;  gamma  [see  Choi  and  Wette  (1969)  and  Harter 
and  ^k^ore  (1965));  Weihull  [see  Bain  (1972),  Bil  Imaii  e^  j_l  •  n971), 
Cohen  (1965),  Englehardt  (1975),  Haan  and  Beer  (1967),  Lemon  (1975) 
and  Rockette  e t  a  1 .  (1973)];  exponential  [see  Deemer  and  Votaw  (1955), 
El-Sayyad  (1967)  and  Epstein  (1957)];  and  for  the  normal  and  lognormal 
[see  Cohen  (1951),  Harter  and  Moore  (1966),  Lambert  (1964)  and  Tallis 
and  Young  (1962)].  The  traditional  approach  also  includes  some  linear 
estimation  properties  like  Best  Linear  Unbiased  (BLU)  and  Best  Linear 
Invariance  (BLI) . 

Interval  estimation  procedures  have  also  been  developed  for 
the  parameters  of  the  life  distributions.  Examples  include  Bain  and 
Englehardt  (1973),  Epstein  (1961),  Harter  (1964)  and  Mann  (1968). 
Point  or  interval  estimators  for  functions  of  the  life  distributions, 
such  as  reliable  life,  reliability  function,  hazard  rate,  etc.  ,  were 
obtained  by  substituting  for  the  unknov^'n  parameters  the  point  or  inter- 
val estimators  obtained  for  them  [see  Johns  and  Lieberman  (1966), 
Bartholomew  (1963),  Criibbs  (1971),  Harris  and  Singpurwalla  (1968,  1969), 
Lawless  ( J  9 7 1  ,  1  9 7 2)  ,  Likcs  (1967),  Mann  (19h9-a,  1969-b,  1970),  Varde 
(1969)  and  Linliai  t  (1M(,-,)  j  . 
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Testing  reliability  liypotheses  is  the  second  major  area  of 
research  in  the  classical  approach  to  life  testing.  By  means  of 
the  methods  referenced  previously,  a  test  statistic  is  selected, 
regions  of  acceptance  and  rejection  are  set  up,  and  risks  of  in- 
correct decisions  are  calculated.  In  addition  it  is  emphasized 
that  the  risks  of  incorrect  decisions  are  specified  before  the 
sample  is  obtained,  and  in  tiiis  case  n,  the  sample  size,  is  gene- 
rally to  be  determined.  Some  of  the  references  in  this  area  include 
[Epstein  (1960),  Mpstein  and  Sobel  (1955),  Kumar  and  Patel  (1971), 
Lilliefors  (1967,  1969),  Sobel  and  Tlschendorf  (1959),  Thoman  et  al. 
(1969,  1970)  and  Ferclio  and  Ringer  (1972)]. 

A  large  part  of  the  statistical  problem  in  reliability  in- 
volves the  estimation  of  parameters  in  failure  models.  Each  of  the 
methods  of  obtaining  point  estimates  previously  referenced  has 
certain  statistical  properties  that  make  it  desirable,  at  least 
from  a  theoretical  viewpoint.  Not  surprisingly,  point  estimates 
are  often  made  (particularly  in  reliability)  because  decisions  are 
to  be  based  on  them.  The  consequences  of  the  decisions  based  on  the 
estimates  often  involve  money,  or,  more  generally,  some  form  of 
utility.  Hence  the  decision  maker  is  more  interested  in  the  practi- 
cal consequence   of  the  estimate  than  in  its  tlieoretical  properties. 
In  particular,  he  may  be  interested  in  making  estimates  that  mini- 
mize the  expected  loss  (cost),  but  this  can  not  be  accomplished  in 
general  with  classical  methodology  because  the  methodology  does  not 
admit  probaliility  distributions  of  the  parameters. 
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2.2.4  Bayesian  Techniques  in  Life  Testing 

Tlie  non-liayeslan  (classical)  approach  to  estimation  con- 
siders an  unknown  parameter  as  fixed.  This  means  that  classical  in- 
terval estimation  and  liypothesis  testing  must  lean  on  inductive 
reasoning  either  through  the  likelihood  function  or  the  sampling 
distributions.  Tn  point  estimation,  the  classical  approai^h  must 
depend  on  estimates  the  criteria  for  which  often  are  not  based  on 
the  practical  consequences  of  the  estimates.  On  the  other  hand,  Bayes 
procedures  assume  a  prior  distribution  of  the  parameter  space,  that 
is,  considers  the  i^arameter  as  a  random  variable,  and,  hence,  tlie  pos- 
terior distribution  is  available.  This  creates  the  possibility  of 
a  whole  new  class  of  criteria  for  estimation,  namely,  minimization 
of  expected  loss,  probability  intervals  and  others. 

In  view  of  the  difficulty  in  assessing  utility  or  costs  of 
complex  reliability  prol^iems,  in  previous  studies  Bayesian  methods 
have  been  used  primarily  to  provide  a  means  of  combining  previous 
data  (expressed  as  the  prior  distribution)  with  observed  data 
(expressed  in  the  likelihood  function)  to  obtain  estimates  of  parame- 
ters by  using  the  posterior  density.  However,  it  must  be  emphasized  that 
Bayesian  methods  are  perfectly  general  in  providing  whatever  the 
reliability  problem  demands. 

Tliere  is  a  loss  function  that  is  rather  popular  in  Bayesian 
analysis  anil  gives  simple  results.  Su()pose  that  6  is  an  estimate  of 
i)  and  that  the  loss  function  is 

(2.2.28)    L(u,u)  =  (e-e)-^. 
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This  function  states  that  the  loss  is  equal  to  the  square  of  the 
distance  of  9  from  0.  The  Bayes  approach  is  to  select  the  estimate 
of  0  that  minimizes  the  expected  loss  witli  respect  to  tlie  posterior 
distribution.  The  estimate  that  accomplishes  this  is  the  posterior 
mean,  that  is, 

(2.2.29)  6  =   E(0| t^,  t2,  ...  ,  t^;P)  ; 

where  P  represents  prior  information.  The  above  loss  function  is  often 
called  the  quadratic  ]oss  function  and  the  posterior  mean  is  termed 
the  Bayes  estimate,  if  the  loss  function  is  of  the  form 

(2.2.30)  L(e,6)  =  |6-e|  , 

the  estimate  of  6  that  minimizes  the  expected  loss  is  the  median  of 
the  posterior  distribution.  Canfield   (1970)  developed  a  Bayesian  es- 
timate of  reliability  for  the  exponential  case  using  this  loss  function. 
The  resulting  estimate  is  seen  to  be  the  MVUE  of  reliability  when  the 
prior  is  flat.  A  third  and  simple  case  is  the  asymmetric  linear, 


(2.2.31)    1.(0,0)  = 


ky  (0-6) 


if  ex 


k  (e-9)    .  if  e<e, 


The  estimate  of  G  that  minimizes  the  expected  loss  if  the  k  /(k,,+  k  ) 

u  ^  0    u 

fractile,  [see  Raiffa  and  Schlaifer  (1961)].  Beyond  these  three 
simple  cases,  things  become  difficult  in  regard  to  loss  function  for 

two  reasons: 

(i)  difficulties  in  assessing  a  realistic  loss  function 
and 

(ii)  mathematical  intractability. 
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The  expected  loss  is  gmierally  a  random  variable  a  priori  since  it 
depends  on  the  as  yet  unobserved  sample  data.  The  unconditional  ex- 
pectation (witli  respect  to  tlie  sample)  of  the  expected  loss  is  called 
the  "Bayes  risk"  and  is  minimized  by  the  Bayes  estimate. 

Hayes  methods  liave  been  used  in  a  variety  of  areas  of  re- 
liability. Most  uses  can  be  characterized  as  point  or  interval  esti- 
mation of  parameters  of  life  distributions  or  of  reliability  functions. 
Examples  include  Breijiohl,  et.  al.,  (1965)    who  studied  the  be- 
havior of  a  family  of  Bayesian  posterior  distributions.  In  addition 
the  properties  of  the  mean  of  the  posterior  distribution  as  a  point 
estimate  and  a  method  for  constructing  confidence  intervals  were 
given.  The  problem  of  hypothesis  testing  was  considered,  among  others, 
by  MacFarland  (1972).  IK-  provided  a  simple  exposition  of  the  rudi- 
ments of  applying  Bayes  equation  to  hypotlieses  concerning  relia- 
bility. 

The  Bayesian  approach  has  also  been  applied  to  parameter 
estimation  and  reliability  estimation  of  some  known  distributions 
like  gamma,  PoissiMi,  lognormal  and  others.  Lwin  and  Singh  (1974) 
considered  a  Bayesian  analysis  of  a  two-parameter  gamma  model  in 
life  testing  context  with  special  emphasis  on  estimation  of  the 
reliability  function.  Tlie  Poisson  distribution  has  received  the 
attention  of  Canavos  (1972,  1973).  In  the  first  article  a  smooth 
empirical  Bayes  estimator  is  derived  for  the  hazard  rate.  The  re- 
liability function  is  also  estimated  either  by  using  the  empirical 
Bayes  estimate  of  the  |i,irameters ,  or  by  obtaining  the  expectation 


36 


of  the  reliability  function.  Results  indicate  a  significant  reduc- 
tion In  mean  squared  error  of  the  empirical  Bayes  estimates  over 
the  maximum  likelihood  estimates.  A  similar  result  v^as  also  derived 
for  the  exponential  distribution  by  hemon  (1972)  and  by  Martz  (1975). 
Next,  Canavos  developed  Bayesian  procedures  for  life  testing  witli 
respect  to  a  random  intensity  parameter.  Bayes  estimators  were 
derived  for  the  Poisson  parameters  and  reliability  function  based 
on  uniform  and  gamma  prior  distributions.  Again,  as  expected,  the 
Bayes  estimators  have  mean  squared  errors  (MSE)  that  are  appreciably 
smaller  than  those-  of  the  minimum  variance  unbiased  estimator  (MVUE) 
and  of  tlie  maximum  likelihood  estimator  (MLE)  . 

Zellner  (1971)  has  studied  the  Bayesian  estimation  of  the 
parameters  of  the  lognormal  distribution.  Employing  a  flat  prior, 
Zellner  found  that  the  MSE  estimators  of  the  parameters  are  the 
optimal  Bayesian  estimators  when  a  relative  squared  error  loss 
function  is  used. 

The  Weibull  and  exponential  function  have  received  most 
of  the  attention  of  authors  who  have  studied  life  distributions 
from  a  Bayesian  viev^7point.  The  Weibull  process  with  unknown  scale 
parameter  is  taken  as  a  model  by  Soland  (1968)  for  Bayesian  decision 
theory.  The  family  of  natural  conjugate  prior  distributions  for  the 
scale  parameter  is  used  in  prior  and  posterior  analysis,  in  addition, 
prepostorior  analysis  is  given  for  an  acceptance  sampling  problem 
with  utility  linear  in  the  unknown  mean  of  the  Weibull  process.  Soland 
(1969)  extended  the  analysis  by  treating  both  the  shape  and  scale 
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parameters  as  uiikmnvn,  but  as  was  previously  kuovvai  Jt  is  not  possi- 
ble to  find  a  family  of  continuous  ^iiint  distributions  on  the  two 
parameters  that  is  (.-Itjsed  under  sampling,  so  a  family  of  prior  dis- 
tributions is  used  that  places  continuous  distributions  on  the  scale 
parameter  and  discrete  distributions  on  tl\e  shape  parameter.  Prior 
and  posterior  analysis  are  examined  and  seen  to  be  no  more  difficult 
than  for  the  case  in  which  only  the  scale  parameter  is  treated  as 
unknov\;n ,  but  posterior  analysis  and  determination  of  optimal  sampling 
plans  are  considerably  more  complicated  in  this  case. 

In  Bury  (1972),  a  two-parameter  Welbull  distribution   is 
assumed  to  be  an  appropiate  statistical  life  model.  A  Bayesian  decision 
model  is  constructed  around  a  conjugate  probability  density  function 
for  the  Weibull  hazard  rate.  Since  a  single  sufficient  statistic  of 
fixed  dimensionality  does  not  exist  for  the  Weibull  model,  Bury  was 
able  to  consider  only  two  sampling  plans  in  his  preposterior  analysis: 
obtain  one  further  observation  or  terminate  testing.  Bury  points  out 
that  small  sample  Bayesian  analysis  tends  to  be  more  accurate  than 
classical  analysis  because  of  the  additional  prior  information  utilized 
, in  the  analysis.  Bayes  credible  bounds  for  the  scale  parameter  and 
for  the  reliability  function  are  derived  by  Papadopoulos  and  Tsokos 
(1975). 

Reliability  data  often  include  information  that  the  failure 
event  lias  not  yet  occuL-red  for  some  items,  while  observations  of 
complete  lifetimes  are  available  for  other  items.  Cozzolino  (197A) 
addressed  this  problem   from  a  Bayesian  point  of  view,  considering 
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density  functions  that  \\a.\e    failure  rate  functions  consisting  of  a  known 
function  multiplied  by  an  unknown  scale  factor.  It  is  shov-zn  that  a  gamma 
family  of  priors  is  conjugate  for  the  unknown  scale  parameter  for  both 
complete  and  incomplete  experiments.  A  very  flexible  and  convenient 
model  resulting  from  the  assumption  of  a  piecewise  constant  failure  function. 

Life  tests  that  are  terminated  at  preassigned  time  points  or 
after  a  preassigned  number  of  failures  are  sometimes  found  in  reliabil- 
ity tlieory.  Bhattacharya  (1967)  provided  a  Bayesian  analysis  of  the 
exponential  model  based  on  this  kind  of  life  test.  He  showed  that  the 
reliability  estimate  for  a  diffuse  prior  (wliich  is  uniform  over  the 
entire  positive  line)  closely  resembles  the  classical  MVUE,  and  he 
considered  the  role  of  prior  quasi-densities   when  a  life  tester  has 
no  prior  information.  Bliattacharya  points  out  that  the  use  of  constant 
density  over  the  positive  real  line  has  been  suggested  to  express 
ignorance  but  that  it  causes  problems.  For  example  it  can  not  be 
interpreted  as  a  probability  density  since  it  assigns  infinite  measure 
on  the  parameter  space.  [See  Box  and  Tiao  (1972).] 

A  paper  by  Dunsmore  (1974),  stands  out  from  among  the  other 
Bayesian  papers  in  life  testing  and  is  particularly  pertinent  to 
the  life  testing  application  in  this  thesis.  This  article  is  an 
important  exception  because  it  carries  the  Bayesian  approach  to  its 
natural  conclusions  by  determining  prediction  intervals  for  future 


If  g(0)  is  any  non-negative  function  defined  in  the  parame- 
ter space  U   such  that  g(r;)  f  (J   V  0  e  .Q ,  then  g(b)  is  called  a  prior 
quasi-densi  ty . 
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observations  in  life  testing  nsinj^  the  concept  of  the  Bayesian  pre- 
dictive distribut  iiin .  One  objective  of  prediction  is  to  provide  some 
estimate  either  point  or  interval,  for  future  observations  of  an 
experiment  F  based  on  the  results  obtained  from  an  informative  experi- 
ment E.  As  we  mentioned  before,  the  classical  approach  to  prediction 
involves  the  use  of  tolerance  regions.  [See  Aitchison  (1966),  Folks 
and  Broi%me  (1975),  Guenther  et  al.  (1976)  and  Hewett  and  Moeschberger 
(1976)].  In  these  we  obtain  a  prediction  interval  only,  and  the 
measure  of  confidence  refers  to  the  repetitions  of  the  whole  experimental 
situation.  The  Bayesian  approach  on  the  other  hand,  allows  us  to 
incorporate  further  information  which  might  be  available  through  a 
prior  distribution  and  leads  to  a  more  natural  interpretation. 

Let  t  ,  ...,  t   be  a  random  sample  from  a  distribution  with 
probability  density  function  i'(t|6),  (tcT;0eO),  and  let  y,,,  y.,,  ...,  y 
be  a  second  independent  random  sample  of  "future"  observations  from 
a  distribution  witli  probability  density  function  F(y|o),  (yeY;Gc0). 
Our  aim  is  to  make  predictions  about  some  function  of  y  ,  y^,  ...,  y  . 
The  Bayesian  approach  assumes  that  a  prior  density  function  P(6), 
(8t:G)  is  available  that  measures  our  uncertainty  about  the  value  of  0. 
If  the  information  in  E    is  summarized  by  a  sufficient  statistic  t 
then  a  posterior  distribution  P(6|t)  is  available.  Suppose  now  that 
we  wish  to  predict  some  statistic  y  defined  on  y  ,  y,,,  ...  y  .  Then 


Such  a  sufficient  statistic  will  always  exist  since,  for 

T 
example,  t  could  [<l-    Lho  vector  (t,,  t .,  ,  ...  ,  t  )  . 

1    ^  n 
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the  predictive  density  function  for  y  is  given  by 

(2.2.32)  P(ylt)  =  /  P(y|0)  P(6|t)  de  \ 

0 

A  Bayesian  prediction  interval  of  cover  g  is  then  defined  as  an  in- 
terval I  such  that , 

(2.2.33)  P(l|t)  =   /  P(y|t)  dy  =  6. 

[See,  for  example,  Aitchison  and  Sculthorpe  (1965),  Aitchison  (1966) 
and  Guttman  (1970).]  It  should  be  emphasized  that  in  the  Bayesian 
approach  the  complete  Inferential  statement  about  y  is  given  by  the 
predictive  density  function  P(y|t).  Any  prediction  interval  is  only 
a  summary  of  the  full  description  P(y|t). 

In  general  there  will  be  many  intervals  I  that  satisfy  (2.2.33). 
Dunsmore  considers  most  plausible  Bayesian  prediction  intervals 
(conmionly  known  as  highest  posterior  density  (HPD)  intervals)  of  cover 
(3,  wliich  have  the  form, 

(2.2.34)  I  =  [y:P(y  |t)  >    X]  , 

where  A  is. determined  by  P(I|t)  =  3. 

In  conclusion  we  might  say  that  the  "uses  of  Bayesian  methods 
in  life  testing  have  been  limited.  However  in  those  cases  where  Bayes 
estimators  have  been  found,  they  performed  better,  according  to  clas- 
sical criteria,   tlian  the  conventional  ones.  The  use  of  loss  functions 
has  not  been  analyzed  deeply  for  the  reasons  mentioned  before;  namely 


It  is  implicitly  assumed  in  (2.2.32)  that  conditional  on 
,  y  and  t  are  independent. 


that  the  loss  funt'tion  is  usually  complex  and  unknown,  and  that  even 
when  tlie  hiss  1; miction  is  known  the  Bayes  estimate  is  sometimes  dif- 
ficult to  find.  Some  of  these  problems  wi  i,l  be  solved  with  the  develop- 
ment of  mathematical  theory  and  probably  with  the  development  of 
computer  systems.  Only  the  Dunsmore  paper  fully  used  the  Bayesian  method- 
ology to  obtain  prediction  Intervals  that  consider  all  available  in- 
formation and  fully  recognize  the  remaining  parameter  uncertainty. 
All  of  the  papers  discussed  in  the  previous  section  con- 
sidered a  stationary  situation.  That  is,  the  known  parameters  and 
the  distributions  used  are  assumed  to  remain  the  same  across  all 
time  periods.  It  would  be  of  value  to  study  the  nonstationary  case, 
where  tiie  parameters  are  changing  in  time  and  possibly  tlie  distri- 
butions could  also  change  in  time.  It  is  important  to  recognize, 
however,  that  probably  the  |iroblems  now  faced  ^^/itll  the  stationarity 
assumption  will  be  greater  when  tliat  assumption  is  relaxed.  Never 
this  dynamic  system  is  well  worth  investigating. 

2 .  'i   Modeling  of  Nonstationary  Processes 
For  many  real  world  data  generating  processes  the  assump- 
tion of  stationarity  is  questionable.  Take  for  instance  life  testing 
models.  l\nien  it  is  assumed  tiiat  the  life  of  certain  commodities 
follows  a  lognorm.al  distribution,  for  example,  the  stationarity  as- 
sumption could  be  expecti'd  to  hold  over  short  periods  of  time;  but 
in  most  cases  it  \Jould  be  expected  tliat  for  a  lengtliy  period,  sta- 
tionarity would  hi  a  doubtful  assumjJt  ion .  If  the  model  represents 
the  life  of  perishable  |)rodut:ts,  like  food  for  example,  then  it 
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v%fOu]d  be  expected  that  enYircnmental  factors  like  heat  and  humidity 
could  change  and  affect  the  cliaracteristics  of  the  life  distribution 
of  the  product  or  affect  the  input  factors  used  in  the  manufacturing 
process.  Furthermore,  the  wearout  of  the  machines  used  in  the  manu- 
facture of  the  products  could  cause  changes  in  the  quality  of  the  pro- 
ducts and  hence  in  the  parameters  of  the  life  distributions. 

Random  parameter  variation  is  surely  to  be  a  reasonable  as- 
sumption when  we  are  concerned  with  economic  variables,  like  those 
used  in  Cost-Voluiue-Prof it  analysis.  A  wide  spectrum  of  circumstances 
could  be  mentioned  where  the  economic  environment  is  gradually 
affected.  For  exam[)le,  the  level  of  economic  development  changes 
gradually  in  a  country  and  consequently  brings  gradual  changes  in 
related  variables  like  income,  consumption  and  price.  Also,  consumer's 
tastes  and  preferences  evolve  relatively  slowly  as  social  and  economic 
conditions  change  and  as  new  marketing  channels  or  techniques  are 
developed.  The  gradual  increase  in  technology  available  to  the  indus- 
try and  to  the  government  may  produce  changes  that  are  not  dramatic 
but  that  will  liave  some  Influence  in  any  particular  period  of  time. 
In  other  words,  it  seems  reasonable  to  assume  that  in  at  least  some 
situations  the  distribution  functions  of  variables,  like  sales,  price 
or  costs,  could  be  gradually  changing  in  time.  It  is  important  to 
emphasize  that  we  are  referring  to  gradual  changes,  the  effects  of 
which  are  not  perf(^ctly  predictable  in  advance  for  a  particular  period. 

If  a  data  generating  process  characterized  by  some  parameter 
e  is  nonstat ionary ,  then  it  is  not  particularly  realistic  to  make 
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inferences  and  decisions  concerning  6  as  if  0  only  took  on  a  single 

value.  Instead  we  should  be  concerned  with  a  sequence  6,,  e„,  ...  ,  of 

values  of  6  corresponding  to  different  time  periods,  assuming  the 

characteristics  of  the  process  vary  across  time  but  are  relatively 

constant  within  a  given  period.  Some  researchers  have  studied  this 

problem  with  particular  stochastic  processes. 

Chernoff  and  Zacks  (196A)  studied  what  they  called  a  "tracking" 

problem.  Observations  are  taken  on  the  successive  positions  of  an 

object  traveling  on  a  path,  and  it  is  desired  to  estimate  its  current 

position.  If  the  path  is  smooth,  regression  estimates  seem  appropiate. 

Hov;ever,  if  the  path  is  subjected  to  occasional  changes  in  direction, 

regression  will  give  misleading  results.  Their  objective  was  to  arrive 

at  a  simple  formula  which  implicitly  accounts  for  possible  changes  in 

direction  and  discounts  observations  taken  before  the  latest  change. 

Successive  observations  were  assumed  to  be  taken  on  n  independently 

and  normally  distributed  random  variables  v^ith  means  y-,  ,  p-,,  ...  ,  (i  . 

J-    »-         n 

Each  mean  is  equal  to  the  preceding  mean  except  vjhen  an  occasional 
change  takes  place.  The  object  is  to  estimate  the  current  mean  p  • 
They  studied  the  problem  from  a  Bayesian  point  of  view  and  made  the 
following  assumptions:  tlie  time  points  of  change  obey  an  arbitrary 
specified  a  priori  probability  distribution;  the  amounts  of  change 
in  the  means  (when  changes  take  place)  are  independently  and  normally 
distributed  random  variables  with  zero  mean;  and  the  current  mean 
U   is  a  normally  distributed  random  variable  with  zero  mean.  Using 
a  quadratic  loss  fimctiou  and  a  uniform  prior  distribution  for  y-.  on 
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the  wliole  real  line  Lhey  derived  a  Bayes  estimator  of  \i    .  In  ad- 
dition they  derived  llie  minimum  variance  linear  unbiased  (MVLU) 
estimator  of  y  .  Comparing  both  estimators  they  found  tliat  although 
the  MVLU  estimator  is  considerably  simpler  tlian  the  Bayes  estimator, 
when  the  expected  number  of  changes  in  the  mean  is  neither  zero  nor 
n-i  the  Bayes  estimator  is  more  efficient  than  the  MVLU. 

Chernoff  and  Zacks  studied  an  alternative  problem  in  which  the 
prior  distribution  of  time  points  of  change  is  such  that  there  is  at 
most  one  change.  This  problem  leads  to  a  relatively  simple  Bayes  esti- 
mator. However,  difficulties  may  arise  if  this  estimator  is  applied 
when  there  are  actually  two  (or  more)  changes.  The  suggested  technique 
starts  at  the  entl  of  a  series,  searches  back  for  a  change  in  mean  and 
then  estimates  tlie  mean  value  of  the  series  forward  from  the  point  at 
which  such  a  change  is  assumed  to  have  occured.  They  designed  a  procedure 
to  test  whether  a  change  in  mean  has  occurred  and  found  a  simpler  test 
than  the  one  used  by  Page  (1954,  1955).  Most  of  the  results  appearing  in 
this  paper  were   derived  in  a  previous  paper  by  Barnard  (1959)  in  a  some- 
what different  manner,  but  the  general  results  are  essentially  the  same. 

The  previous  paper  by  Chernoff  and  Zacks  motivated  some 
research  in  the  follov/ing  years.  Mustafi  (1968)  considered  a  situa- 
tion in  which  a  random  variable  is  observed  sequentially  over  time 
and  the  distribution  of  this  random  variable  is  sub-jected  to  a  pos- 
sibl.-  change  at  every  point  in  the  sequence.  The  study  of  this  prob- 
lem is  centered  about  the  model  introduced  by  Chernoff  and  Zacks. 
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Three  aspects  of  tlie  prublem  were  considered  ])y  Mustafi.  First  he 
considered  the  i^robleni  of  estimating'  the  current  value  of  the  mean 
on  the  basis  of  a  set  of  observations  taken  up  to  the  present.  Chernoff 
and  Zacks  assumed  that  certain  parameters  occuring  in  the  model  were 
kno^^m .  Mustafi  tlien  derives  a  procedure  for  estimating  the  current 
value  of  the  mean  on  tlie  basis  of  a  set  of  observations  taken  at 
successive  time  points  wlien  nothing  is  known  about  the  other  parame- 
ters occuring  in  tlie  model.  Second  Mustafi  estimated  tlie  various 
points  of  change  in  the  framework  of  an  empirical  Bayes  procedure  and 
used  an  idea  similar  to  tliat  of  Taimiter  (1966)  to  derive  a  sequence 
of  tests  to  be  applied  at  each  stage.  Third  he  considers  n  independent 
observations  of  a  random  variable  that  belong  to  the  one  parameter 
exponential  family  taken  at  successive  time  points.  He  examines  the 
problem  of  testing  the  equality  of  these  n  parameters  against  the 
alternative  that  the  parameter  has  changed  r-times  at  some  unkno\^m 
points  where  r  is  some  finite  positive  integer  less  than  n.  He  de- 
veloped a  test  procedure  generalizing  the  techniques  used  by  Render 
and  Zacks  (1966)  and  Page  (19S5)  . 

Hinich  and  Farley  (1966)  also  studied  the  problem  of  estima- 
tion models  for  time  series  with  nonstationary  means.  They  assumed 
a  model  similar  to  the  one  developed  by  Chernoff  and  Zacks  except 
that  they  assumed  that  the  number  of  points  of  change  per  unit  time 
are  Poisson  distriluited  with  a  knovm  shift  rate  parameter.  They  found 
an  estimator  for  the  mean  which  is  unbiased  and  efficient.  Also  it 
turned  out  to  be  a  linear  combination  of  the  vector  of  observations. 
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The  Farley-llinich  technique  attempts  to  estimate  jointly  the  level 
of  the  mean  at  the  beginning  of  a   series  as  v;ell  as  the  size  of  the 
change  (if  any) . 

Farley  and  Hinich  in  a  later  paper  (1970)  compared  the  method 
developed  in  (196h)  with  the  one  presented  by  Chernoff  and  Zacks  (1954) 
and  later  generalized  by  Mustafi  (1968).  Some  ways  were  examined  to 
systematically  track  time  series  which  may  contain  small  stochastic 
mean  shifts  as  well  as  random  measurement  errors.  A  "small"  shift 
is  one  which  is  small  relative  to  measurement  error.  Three  approaches 
were  tested  with  artificial  data,  by  means  of  Monte  Carlo  methods, 
using  mean  shifts  which  were  rather  small,  that  is,  mean  shifts  which 
\-iere   half  the  magnitude  of  random  measurement  error  variance.  Several 
false  starts  with  actual  marketing  data  showed  that  there  was  an  iden- 
tification problem  to  provide  an  adequate  test  of  the  procedures' 
performance,  and  artificial  data  of  known  configuration  provided  a 
more  natural  starting  point.  Two  techniques  (one  developed  by  the 
authors  and  the  oth.er  by  Chernoff  and  Zacks)  involved  formal  estimation 
under  the  assumption  that  there  was  at  most  one  discrete  jump  in  a 
data  record  of  fixed  length  of  the  type  often  stored  in  an  information 
system.  Both  techniques  performed  reasonably  well  when  the  rate  of 
shift  occurrence  was  known,  but  both  teciiniques  are  very  sensitive 
to  prior  specification  of  the  rate  at  which  shifts  occur  in 
terms  of  both  classes  of  errors,  that  is,  missing  shifts  wliich 
occur  and  idenc living  "shifts"  which  do  not  occur.  Knowing  the 
shift  rate  precisely  and  knowing  that  more  than  one  siiift  in  a  record 


A7 


is  extriimely  unlikely  are  two  very  severe  restrictions  for  many  ap- 
plications. A  sin.pler  filter  technique  was  tested  siriiJ.irly  witli  more 
promising  results  in  terms  of  avoiding  both  classes  of  errors.  The 
filter  approach  involved  first  smoothing  the  series  and  then  imple- 
menting ad  hoc  decision  rules  based  on  consecutive  occurrences  of 
smoothed  values  falling  outside  a  predetermined  range  around  the 
moving  average. 

Harrison  and  Stevens  have  produced  two  important  papers  about 
Bayesian  forecasting  using  nonstationary  models.  In  the  first  of  these 
papers  (1971),  they  described  a  new  approach  to  short-term  forecasting 
based  on  Bayesian  princiiiles  in  conjunction  with  a  multi-state  data- 
generating  process.  The  various  states  correspond  to  th.e  occurrence  of 
transient  errors  and  step  changes  in  trend  and  slope.  The  performance 
of  conventional  systems,  like  the  growth  models  of  Holt  (1957),  Brown 
(1963)  and  Box-Jenkins  (1970),  is  often  upset  by  tlie  occurrence  of 
changes  in  trend  ,uul  slope  or  transients.  In  Harrison  an<l  Stevens' 
approacli  events  of  t'nis  nature  are  modelled  explicitly,  and  succes- 
sive data  points  are  used  to  calculate  the  posterior  probabilities 
of  such  events  at  each  instant  of  time. 

In  ttie  second  paper  (1976)  ,  Harrison  and  Stevens  describe  a 
more  general  approach  to  forecasting.  The  principles  of  Bayesian  fore- 
casting are  discussed  and  the  formal  inclusion  of  the  "forecaster" 
in  the  forecasting  systi'iii  is  emphasized  as  a  major  feature.  The  criti- 
cal distinction  is  that  iietween  a  statistical  forecasting  metliod  and 
a  forecasting  system.  The  former  transform  input  data  into  output  in- 
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formation  in  a  purely  mechanica].  way.  The  latter,  however,  includes 
people:  the  person  responsible  for  the  foret:ast  and  all  the  people 
concerned  with  using  tlie  forecasts  and  supplying  information  relevant 
to  the  resulting  actions.  It  is  necessary  that  people  can  communicate 
their  information  to  the  method  and  that  the  method  clearly  communi- 
cates the  uncertain  information  in  such  a  way  that  it  is  readily 
interpreted  and  accepted  by  decision  makers.  The  basic  model,  called 
by  them  "the  dynamic  linear  model",  is  defined  together  with  Kalman 
filter  recurrence  relations  and  a  number  of  model  formulations  are 
given  based  on  their  result.  They  first  plirase  the  models  in  terms 
of  their  "natural"  parameters  and  structure,  and  then  translate  them, 
into  the  dynamic  1  inear  model  form.  Some  of  the  models  discussed  by 
them  are,  a)  regression  models,  b)  the  steady  model,  c)  the  linear 
growth  model,  d)  the  general  polynomial  models,  e)  seasonal  models, 
f)  autoregressive  models,  and  g)  moving  average  models, 

Multiproct^ss  models  introduce  uncertainty  as  to  the  under- 
lying model,  itself,  and  this  approach  is  described  in  a  more  general 
fashion  than  in  tlieir  1971  paper.  In  the  1976  paper  they  present  a 
Bayesian  approach  to  forecasting  which  not  only  includes  many  con- 
ventional metliods,  as  presented  before,  but  possesses  a  remarkable 
range  of  additional  facilities,  not  the  least  being  its  ability  to 
respond  effectively  in  the  start-up  situation  where  no  prior  data 
history  (as  distinct  from  information)  is  available.  The  essential 
foundations  of  the  method  are: 

(a)  a  parametric  (or  state  space)  model,  as  distinct  from 
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a  functional  model; 

(b)  probabilistic  information  on  the  parameters  at  any  given 
time; 

(c)  a  sequential  model  definition  which  describes  how  the 
parameters  change  in  time,  both  systematically  and  as  a  result  of 
random  shocks; 

and 

(d)  uncertainty  as  to  the  underlying  model  itself,  as  be- 
tween a  number  of  discrete  alternatives. 

Kamat  (1976)  developed  a  smoothed  Bayes  control  procedure  for 
controlling  the  output   of  a  production  process  when  the  quality  charac- 
teristic is  continuous  with  a  linear  shift  in  its  basic  level.  The 
procedure  uses  Bayesian  estimation  with  exponential  smoothing  for 
updating  the  necessary  parameter  estimates.  The  application  of  the 
procedure  to  real  life  data  is  illustrated  with  an  example.  Applica- 
tions of  the  traditional  x-chart  and  the  cumulative  sum  control  chart 
to  the  same  data  are  also  illustrated  for  comparison. 

In  Chapter  Three  of  this  dissertation  we  develop  a  Bayesian 
model  of  nonstationarity  for  normal  and  lognoi'mal  processes.  We  build 
our  results  directly  on  two  papers,  Winkler  and  Barry  (1973)  and  Barry 
and  Winkler  (1976).  In  the  first  paper  they  developed  a  Bayesian  model 
for  nonstationary  means  in  a  multinormal  data-generating  process  and 
demonstrated  that  the  presence  of  nonstationary  means  can  have  an  impact 
upon  the  uncertainty  associated  with  a  given  random  variable  that  has 
a  normal  distribution.  Moreover,  the  nonstationary  model  considered  by 
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them  seems  to  have  more  realistic  properties  than  the  corresponding 
stationary  model.  For  example,  they  found  that  in  the  nonstationary 
model  the  recent  observations  are  given  more  weight  that  the  distant 
ones  in  determining  the  mean  of  the  distribution  at  any  given  time, 
and  the  uncertainty  about  the  parameters  of  the  process  is  never 
completely  removed.  Barry  and  Winkler  (1976)  were   concerned  with  the 
effects  of  nonstationarity  on  portfolio  decision.  The  use  of  a  Bayesian 
approach  to  statistical  inference  and  decision  provides  a  convenient 
framework  for  studying  the  problem  of  changing  parameters,  both  in 
terms  of   forecasting  security  prices  and  in  terms  of  portfolio  decision 
making.  In  this  thesis  a  number  of  extensions  to  their  results  are 
made,  thereby  removing  some  of  the  restrictiveness  of  their  results, 
and  applications  are  considered  in  the  areas  of  CVP  analysis  and  life 
testing. 


CHAPTER  THREE 

NONSTATIONARITY  IN  NORMAL  AND  LOGNORMAL  PROCESSES 
3. 1  Introduction 
The  normal  distribution  is  considered  by  many  persons  an  im- 
portant distribution.  Tlie  earliest  workers  regarded  the  distribution 
only  as  a  convenient  approximation  to  the  binomial  distribution.  However, 
with  the  work  of  Laplace  and  Gauss  its  broader  theoretical  importance 
spread.  The  normal  distribution  became  widely  and  uncritically  accepted 
as  the  basis  of  much  practical  statistical  work.  More  recently  a  more 
critical  spirit  has  developed,  with  more  attention  being  paid  to  systems 
of  "skew  (asynmiet ric)  frequency  curves".  This  critical  spirit  has  per- 
sisted, but  is  offset  by  developments  in  both  theory  and  practice.  The 
normal  distribution  has  a  unique  position  in  probability  theory,  and  can 
be  used  as  an  approximation  to  many  other  distributions.  In  real  world 
problems,  "normal  theory"  can  frequently  be  apjjlied,  with  small  risk  of 
serious  erros,  whtni  substantially  non-normal  distributions  correspond  more 
closely  to  observed  values.  This  allows  us  to  take  advantage  of  the  elegant 
nature  and  extensive  supporting  numerical  tables  of  normal  theory.  Most 
theoretical  arguments  for  the  use  of  the  normal  distribution  are  based  on 
forms  of  central  limit  tiieorems.  These  theorems  state  conditions  under 
which  the  distribution  of  standardized  sums  of  random  variables  tends  to 
a  unit  non.ial  d  isi  r  i  ImL  ion  as  the  number  of  variables  in  tlie  sum  increases, 
that  is  with  conditions  sufficient  to  ensure  an  asymptotic  unit  normal 

distr ibut  ion. 
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The  normal  dist rlbul ion ,  for  the  reasons  exposed  before,  has 
been  widely  used  and  enumerating  the  fields  of  application  would  be 
lengthy  and  not  really  informative.  However,  we  do  emphasize  that  the 
normal  distribution  is  almost  always  used  as  an  approximation,  either 
to  a  theoretical  or  an  unknown  distribution.  The  normal  distribution 
is  well  suited  to  this  because  its  theoretical  analysis  is  fully  worked 
out  and  often  simple  in  form.  Where  these  conditions  are  not  fullfilled 
substitutes  for  normal  distributions  should  be  sought.  Even  when  nor- 
mal distributions  are  not  used  results  corresponding  to  "normal  theory" 
are  often  useful  as  standards  of  comparison. 

The  use  of  normal  distributions  when  the  coefficient  of  variation 
is  large  presents  many  difficulties  in  some  applications.  For  instance, 
observed  values  more  than  twice  the  mean  would  then  imply  the  existance 
of  observations  with  negative  values.  Frequently  this  is  a  logical  absurdity. 
The  lognormal  distribution,  as  defined  in  equation  2.2.20,  is  in  at  least 
one  important  respect  a  more  realistic  representation  of  distributions 
of  characters  that  cannot  assume  negative  values  than  is  the  normal  distri- 
bution. A  normal  distribution  assigns  positive  probability  to  such  events, 
while  the  lognormal  distribution  does  not.  The  use  of  the  lognormal  distri- 
bution has  been  investigated  as  a  possible  solution  to  this  problem  [see 
Cohen  (1951),  Gallon  (1«79),  Jenkins  (1932)  and  Yuan  (1933)].  In  a 
review  of  the  literature  Gaddum  (1945)  found  that  the  lognormal  dis- 
tribution i;oulil  ht  used  to  describe  several  processes.  In  Chapter  Two 
we  presented  a  list  of  some  of  the  applications  of  this  distribution 
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to  real  life  problems.  Among  those  applications  we  emphasized  its 
use  in  Cost-Volume-Profit  analysis  and  in  life  testing  models.  Fur- 
thermore, by  taking  the  spread  parameter  small  enough,  it  is  possible 
to  construct  a  lognormal  distribution  closely  resembling  any  normal 
distribution.  Hence,  even  if  a  normal  ditribution  is  felt  to  be  really 
appropiate,  it  might  be  replaced  by  a  suitable  lognormal  distribution. 
As  v;as  mentioned  in  Chapter  Two,  most  research  concerned  with 
the  normal  and  lognormal  distributions  has  considered  only  stationary 
situations.  That  is,  tlie  parameters  (known  or  assumed  to  be  known) 
and  distributions  used  are  assumed  to  remain  the  same  in  the  future. 
In  this  third  chapter  we  intend  to  build  a   nonstationary  model  for 
normal  and  lognormal  processes  from  a  Bayesian  point  of  view.  Section 
3.2  sets  the  stage  for  the  development  of  the  nonstationary  model.  In 
it,  we  describe  essential  features  of  the  Bayesian  analysis  of  normal 
and  lognormal   processes  including  prior,  posterior  and  predictive  dis- 
tributions. Two  uncertainty  situations  are  considered  in  this  section; 
In  one  the  shift  parameter,  \i  ,    is  assumed  to  be  unknown  and  the  spread 
parameter,  a,  is  assumed  to  be  known;  and  in  the  other,  both  parameters 
are  assumed  to  be  unknown.  In  Section  3.3,  we  develop  a   particular  non- 
stationary  model  for  the  shift  parameter  of  the  lognormal  distribution, 
again  under  the  same  two   uncertainty  situations,  and  provide  a  com- 
parison of  the  results  with  a  stationary  model. 
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3.2  B  ayes  la  n  A  nnly  sis  of  Normal  and  Lo^^normal  Processe  s 

lief  ore  the  last  decaiie,  most  of  the  Rayeslan  research  deiiling 
with  problems  of  statistical  inference  and  decisions  concerning  a  parame- 
ter 0  assume  that  G  takes  on  a  single  value;  those  models  are  called 
stationary  models.  For  example,  6  may  represent  the  proportion  of  de- 
fective items  produced  by  a  certain  manufacturing  process;  the  mean 
monthly  profits  of  a  given  company;  the  mean  life  of  a  manufactured 
product  and  so  on.  In  each  case  6  is  assumed  to  be  a  fixed  but  not  knovvoi. 
A  formal  Bayesian  statistical  analysis  articulates  the  evidence  of  a 
sample  to  be  analyzed  \.'ith  evidence  other  than  that  of  tlie  sample;  it 
is  felt  that  there  ususally  is  prior  evidence.  The  non-sample  evidence 
is  assessed  iudgmentally  or  subjectively  and  is  expressed  in  proba- 
bilistic terms,  by  means  of:  (1)  a  data  distribution  tliat  specifies 
the  probability  of  any  sample  result  conditional  on  certain  parameters; 
and  (2)  a  prior  distribution  that  expresses  our  uncertainty  about  the 
parameters.  When  judgment  in  the  form,  of  the  assessment  of  a  likeli- 
hood function  to  apply  t(j  the  data  is  combined  with  evidence  of  a 
sample,  we  have  the  likelihood  function  of  the  sample.  The  likelihood 
function  of  tlie  sample  is  combined  with  the  prior  distribution  via 
Bayes'  theorem  to  produce  a  posterior  distribution  for  the  parameters 
of  the  data  distribution,  and  this  is  the  typical  output  of  a  formal 
Bayesian  analysis.  If  we  assume  that  the  prior  distribution,  for  the 
parameters  of  the  data  »!  i st rihution ,  is  continuous  then  we  may  express^ 
Bayes'  theorem  as 


55 


where 


and 


[i.Z.i)         l(y  x) '.  I  .  ' ,    t(x  i)  +  0; 

'         1 (x I t)  '    ' 

X  denotes  the  vector  of  sample  observations, 

6  represents  all  the  unknown  parameters, 

r  represents  the  known  parameters  of  the  prior 
distribution  of  0. 

We  can  interpret  f(x|o)  in  two  ways:  (1)  for  given  e,  f(x|o) 
gives  the  distribution  of  the  random  vector  s;  (2)  for  given  x,  f(x|6) 
as  a  function  of  u ,  together  with  all  positive  multiples,  in  the  ususal 
usage  is  the  likeJ ihood  function  of  the  sample. 

The  prior  i^robahility  of  the   sample  fCxji)  is  computed  from 

(3.2.2)  fCxIr)  =   /  f(e|T)  f(x|9)  de , 

(J 

from  which  we  see  tliat  f(xli)  can  be  interpreted  as  the  expected 
value  of  the  likelihood  in  the  light  of  the  prior  distribution.  Alter- 
natively, f(x|T)  can  be  interpreted  as  the  marginal  distribution  of 
the  random  vector  x  with  respect  to  the  joint  distribution, 

(3.2.3)  f(x,6|T)  =  f(e|T)  f(x|e). 

Since  (3.2.2)  can  be  computed  in  advance  of  the  sample  for  any  x, 
we  shall  frequently  refer  to  tlie  marginal  distribution  of  x  as  the 
predictive  distr ilniL ion  implied  by  the  specified  prior  distribution 
and  data  distribution. 
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If  we  have  a  posterior  distribution  f(e|x)  and  if  a  future 

random  vector  S   is  to  come  from  f(x  |o),  whicli  may  or  may  not  be 

the  same  data  distribution  as  in  ('3.2.2),  we  may  compute 


(3.2.4)    f(x  |x)  =  /  f(0|x)  f(x  |e)  de. 

0 

We  refer  to  the  distribution  so  defined  as  tlie  predictive  distribution 
of  a  future  sample  implied  by  the  posterior  distribution.  It  must  be 
understood  that  (3.2,2)  and  (3.2.4)  are  but  two  Instances  of  the  same 
relationship;  sometimes  it  is  worth  distinguishing  the  practical  prob- 
lems arising  when  predictions  refer  to  the  present  sample  from  those 
arising  in  connection  with  predictions  about  a  future  sample;  that  is 
a  "not-yet-observed"  sample.  The  revision  of  the  prior  distribution 
gives  the  statistician  a  method  for  dravv'ing  inferences  about  9,  the 
uncertain  expression,  quantity  or  parameter  of  interest,  and  for  deci- 
sions related  to  6. 

In  general  then  we  may  say  that  the  term  Bayesian  refers  to 
any  use  or  user  of  prior  distributions  on  a  parameter  space  (although 
there  is  some  nonpnrametrlc  Bayesian  material  also)  with  the  associ- 
ated application  of  Bayes  theorem  in  the  analysis  of  an  inferential 
or  decision  problem  under  uncertainty.  Such  an  analysis  rests  on  the 
belief  that  in  most  practical  situations  the  statistician  will  pos- 
sess some  subjective  a  priori  information  concerning  the  probable 
values  of  the  parameter.  This  information  may  often  be  reasonably 
summarized  and  formalized  by  tlie  choice  of  a  suitable  prior  dis- 
tribution on  the  parameter  space.  The  fact  that  the  decision  maker 
can  not  specify  every  detail  of  his  prior  distribution  by  direct  asses- 
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snieiit  means  that  t  lic^rc  will  otien  be  considerable  latitude  in  the 
choice  of  the  family  of  distributions  to  be  used,  even  though  tlie 
selectioii  of  a  particular  member  within  the  chosen  family  will 
usually  be  wholly  determined  by  the  decision  maker's  expressed  beliefs 
or  betting  odds.  Three  characteristics  are  particularly  desirable  for 
a  family  of  prior  tiistributions : 

(i)  analytical  tractability  in  tliree  aspects;  namely 

<i)  it  should  be  reasonably  easy  to  determine  the 
posterior  distribution  resulting  from  a  given  prior  and  sample, 

b)  it  should  be  possible  to  express  in  convenient 
form  the  expectations  of  some  simple  utility  functions  with  respect 
to  any  member  of  it, 

and 

c)  the  family  should  be  closed  in  the  sense  tliat  if 
the  prior  is  a  member  of  it,  the  posterior  will  also  be  a  member  of  it; 

(ii)  the  family  should  be  rich,  sot  that  there  will  exist  a 
member  of  it  capable  of  expressing  the  dec:ision  maker's  prior  beliefs 
or  at  least  approximating  them  well; 
and 

(iii)  it  should  be  parametrizable  in  a  manner  which  can 
readily  be  interpi' cited ,  so  that  it  Vv'iil  be  easy  to  verify  that  the 
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cliosen  member  of  tlie  family  is  really  in  close  agreement  with  the 
decision  maker's  prior  jiuigments  about  0  and  not  a  mere  artifact 
agreeing  with  one  or  two  quantitative  sununarizatlons  of  these  judg- 
ments. 

A  family  of  prior  densities  which  gives  rise  to  posteriors 
belonging  to  the  same  family  is  very  useful  inasmuch  as  one  aspect 
of  mathematical  tractability  is  maintained,  and  this  property  has 
been  termed  "closure  under  sampling".  For  densities  which  admit 
sufficient  statistics  of  fixed  dimensionality,  a  concept  to  be 
explained  later,  Raiffa  and  Schlaifer  (1961)  have  considered  a 
method  of  generating  prior  densities  on  the  parameter  space  that 
possess  the  "closure  under  sampling"  property.  A  family  of  such 
densities  has  been  called  by  them  a  "natural  conjugate  family". 
To  define  the  cont;epts  of  sufficient  statistic  and  sufficient  sta- 
tistic of  fixed  d  im.ensional  Ity ,  consider  a  statistical  problem  in 
which  a  large  amoimt  of  experimental  data  has  been  collected.  The 
treatment  of  the  data  is  often  simplified  if  the  statistician 
computes  a  few  numerical  values,  or  statistics,  and  considers  these 
values  as  summaries  of  the  relevant  information  in  the  data.  In  some 
problems,  a  statistical  analysis  that  is  based  on  these  few  sum- 
mary values  can  be  just  as  effective  as  any  analysis  that  could  be 
based  on  all  observed  values.  If  the  summaries  are  fully  informative 
they  are  known  as  sufficient  statistics.  Formally,  suppose  tliat  G  is 
a  parameter  which  takes  a  value  in  the  space  0.  Also  suppose  that  x 
is  a  random  varialile,  or  random  vector,  which  takes  values  in  tlie 
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sample  tipacu?  S.  We  slin  1  1  let  '^(•|0,^)  dfnote  the  conditional  proba- 
bility density  finu't  ion  (p.d.f.)  of  x  wlien  6  =  0^  (G,-t:0).  It  is 
assumed  th.it  the  oljserved  vaJue  of  x  will  be  available  for  making 
inferences  and  decisions  related  to  the  parameter  Q.    Denote  any 
function  T  of  the  observations  x,  a  statistic.  Loosely  speaking, 
a  statistic  T  is  called  a  sufficient  statistic  if,  for  any  prior 
distribution  of  6,  its  posterior  distribution  depends  on  the  ob- 
served value  of  X  only  through  T(x) .  More  formally,  for  any  prior 
p.d.f.  gCe)  and  any  observed  value  xeS,  let  g(*|x)  denote  the  pos- 
terior p.d.f.  of  0,  assuming  for  simplicity  that  for  every  value  of 
xeS  and  every  prior  p.d.f.   g,  the  posterior  g('|x)  exists  and  is 
specified  by  the  IWiyes  theorem.  Then  it  is  said  that  a  statistic 
T  is  sufficient  for  the  family  of  p.d.f. 's  f(-|0),  QcQ,    if 
g(*|x  )  =  g(-|x  )  for  any  prior  p.d.f.   g  and  any  two  points  x,eS 
and  x„E.S  such  that  T(x,)  =  T(x,-,)  . 

Now,  consider  only  data  generating  processes  which  generate 
independent  and  identically  distributed  random  variables  x  ,  x  ,  ...  , 

such  that,  for  any  n  and  any  (x^ ,  x„,  ...  ,  x  )  there  exists  a  suf- 

1    z        .  n 

ficient  statistic.  Sufficient  statisticsof  fixed  dimensionality  are 
those  statistics  '1'  such  that  T  (x    x^,  ...  ,  x  )  =  T  =  (T  ,  T    ...  ,  T  ) 
where  a  particular  value  T.  is  a  real  number  and  the  dimensionality 
s  of  T  does  not  depend  on  n.  Independently  of  how  many  elements  we 
sample,  only  s  statistics  are  needed. 

Kaiffa  ami  Sclilaifer  (1961)  present  the  following  metliod  for 
develojjing  the  natural  coniug.ate  ]n"ior  for  a  given  likelihood  function: 
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(i)  Let  the  density  function  of  6  be  g,  where  g  denotes  eithei" 
a  pi-ior  or  a  posterior  density,  and  let  k  be  another  function  on  6 
such  that 

0 
Then  we  shall  write 

(3.2.6)  g(o)  cc  k(e) 

and  say  that  k  is  a  kernel  of  the  density  of  0. 

(ii)  Let  the  likelihood  of  x  given  q   be  l(x|6),  and  suppose 
that  P  and  k  are  functions  on  x  such  that,  for  all  x  and  fcl, 

(3.2.7)  l(x|e)  =  k(x|e)  P(x). 

Then  we  shall  say  that  k(x|6)  is  a  kernel  of  the  likelihood  of  x 
given  6  and  that  P(x)  is  a  residue  of  this  likelihood. 

(iii)  Let  tlie  prior  distribution  of  the  random  variable  6 

liiive  a  density  g'.  For  any  x  such  taht  l*(x|g|)  =  /  1  (x  1 6)  g' ( 6)  d  e  >  0, 

6 
it  follows  from  Bayes  theorem  that  the  posterior  distribution  of  9  has 

a  density  g"  whose  valur-  at  (  6)  for  the  given  x  is 

(3.2.8)  g"(0|x)  =  g'(e)  l(x|e)  N(x)  , 
wliere 

N(x)  =   [  /  g'(G)  KxIh)  de]~^  . 
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(iv)    Now    1ft    k'    deni)te    a   kernel    of    the    prior    density    of    6.    It 
follov^is    Troni    the    definitions    of    k    and    I    and    of    the    .syniiiol    «    that 
the   Bayes    formula    can   he   written, 

(3.2.9)  g"(f)|x)    =   g'(e)    1(x|b)    N(x) 

=   k'(e)     [    /  k(G)    de]~^    k(x|e)    P(x)    N(x) 

e 

8"(g|x)  cc  k'(6)  k(x|o), 
where  the  value  of  the  constant  of  proportionality  for  the  given  x, 

(3.2.10)  P(x)  N(x)  [  /  k(e)  do]"^ 

6 

can  always  be  determined  by  tlie  condition, 

(3.2.11)  g"(o|x)  d8  =  1,  whenever  the  Integral  exists. 

Before  v/e  begin  our  presentation  of  a  basic  Bayesian  analysis 
of  normal  and  lognormal  processes  we  want  to  emphasize  that  caution 
should  be  exercised  in  the  application  of  the  method  developed  by 
Raiffa  and  Schlaifer,  as  is  pointed  out  by  Box  and  Tiao  (1972).  According 
to  them  it  is  often  appropiate  to  analyze  data  from  scientific  inves- 
tigation on  the  assumption  that  the  likelihood  dominate  the  prior,  for 
two  reasons : 

(i)  a  scientiLic  investigation  is  not  ususally  undertaken  unless 
information  supplied  by  the  investigation  is  likely  to  be  considerably 
more  [irecise  tiiaii  information  already  available,  that  is  unless  it  is  - 
Likely  to  increase  knowledge  by  a  substantial  amount.  Therefore  analysis 
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Vv^ith  priors  which  are   dominated  by  the  likelihood  often  realistically 

represents  the  true  inferential  situation. 

(ii)  Even  when  a  scientist  holds  strong  prior  beliefs  about  the 
value  of  a  parameter  6,  nevertlieless,  in  reporting  the  results  it  would 
usually  be  appropiate  and  most  convincing  to  his  colleagues  if  he  ana- 
lyzed the  data  against  a  reference  prior  which  is  dominated  by  the  like- 
lihood, lie  could  say  that,  irrespective  of  what  he  or  anyone  else  be- 
lieved to  begin  with,  tlie  posterior  distribution  represented  what  some- 
one who  a  priori  knew  very  little  about  9  should  believe  in  the  light 
of  the  data.  Reference  priors  in  general  mean   standard  priors  domi- 
nated by  the  likelihood.  [See  Dickey  (1973)  for  a  general  discussion 
of  Bayeaian  methods  in  scientific  reporting.] 

In  general  a  prior  wliich  is  dominated  by  the  likelihood  is  one 
which  does  not  change  very  much  over  the  region  in  which  the  likelihood 
is  appreciable  and  does  not  assume  large  values  outside  that  range.  We 
shall  refer  to  a  prior  distribution  which  has  these  properties  as  a 
locally  uniform  prior.  There  are  some  difficulties,  however,  associated 
with  locally  uniform  priors.  The  choice  of  a  prior  to  characterize  a 
situation  where  "nothing"  (or,  more  realistiqal ly ,  little)  is  known  a 
priori  has  long  been,  and  still  is,  a  matter  of  dispute.  Bayes  tenta- 
tively suggested  that  where  such  knowledge  was  lacking  concerning  the 
nature  of  the  prior  distribution,  it  might  be  regarded  as  uniform.  There 
is  an  objection  to  Bayes  postulate.  If  the  distribution  of  a  continuous 
parameter  b  were  taken  to  be  locally  uniform,  then  the  distribution  of  ., 
log  6,  0    or  some  other  transformation  of  9  (which  might  provide  equally 


63 

sensible  bases  for  parametrizing  the  problem)  would  not  be  locally 
unifonu.  Thus,  nppplication  of  Bayes'  postulate  to  different  trans- 
formations of  G  would  lead  to  posterior  distributions  from  the  same 
data  whicli  were  inconsistent  with  the  notion  that  nothing  is  known 
about  9  or  functions  of  0  „  This  argument  is  of  course  correct,  but 
the  arbitrariness  of  the  choice  of  parametrization  does  not  by  it- 
self mean  that  we  should  not  employ  Bayes  postulate  in  practice. 

Box  and  Tiao  (1972)  present  an  argument  for  choosing  a  par- 
ticular metric  in  terms  of  which  a  locally  uniform  prior  can  be 
regarded  as  noninformative  about  the  parameters.  It  is  important  to 
bear  in  mind  that  one  can  never  be  in  a  state  of  complete  ignorance; 
further,  the  statement  "knowing  little  a  priori"  can  only  have  mean- 
ing relative  to  the  information  provided  by  the  experiment.  A  prior 
distribution  is  supposed  to  represent  knowledge  about  parameters 
before  the  outcome  of  a  proiected  experiment  is  kno^^m.  Thus,  the  main 
issue  is  how  to  select  a  prior  which  provides  little  information  rela- 
tive to  what  is  expected  to  be  provided  by  the  intended  experiment. 

3 . 3  Nonstationary  Model  for  Normal  and  Lognormal  Means 
It  was  emphasized  in  Section  2.3  that  for  many  real  world 
data  generating  processes  the  assumption  of  stationarity  is  question- 
able. Random  parameter  variation  could  be  a  reasonable  assumption  when 
we  are  concerned  with  life  testing  models  or  witli  economic  variables. 
For  example,  iw    life  testing  models,  when  it  is  assumed  that  the  life 
of  certain  parts  follows  a  lognormal  distribution,  the  stationarity 
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assumption  could  be  exjiocted  to  hold  over  short  periods  of  time;  but 
in  most  cases  it  would  be  expected  that  for  a  lengthy  period,  statio- 
narity  would  be  a  doubtful  assumption.  Similarly  in  other  areas  like 
Cost-Volume-Profit  analysis  it  is  doubtful  that  the  stationarity 
assumption  will  hold  over  long  periods  of  time.  Variables  like  sales, 
costs,  and  contribution  margin  are  affected  by  economic,  political 
and  environmental  factors.  In  particular  it  was  pointed  out  that  we 
are  interested  in  gradual  changes,  the  effects  of  which  are  not  perfectly 
predictable  in  advance  for  a  particular  period. 

If  a  data  generating  process  characterized  by  some  parameter 
e  is  nonstationary ,  then  it  is   potentially  misleading  to  make  inferences 
and  decisions  concu^^rning  ti  as  if  6  only  took  on  a  single  value.  Instead 
we  should  be  concerned  witli  a  sequence  6  ,  6„,  ...  ,  of  values  of  6  cor- 
responding to  different  time  periods,  assuming  the  characteristics  of 
the  process  may  vary  across  time.  Several  methods  have  been  proposed 
to  study  stochastic  parameter  variation  [see  Chernoff  and  Zacks  (1964) 
and  Harrison  and  Stevens  (1976)].  Some  have  claimed  that  a  reasonable 
approach  to  the  effects  of  gradual  change  might  be  to  model  the  para- 
meters of  nonstationary  distributions  as  if  they  undergo  independent 
random  shifts  through  time  [see  Barry  (1976),  Carter  (1972),  and 
Kamat  (1976)].  Specifically  they  suggest  the  use  of  a  model  that 
assumes  that  the  mean  of  the  distribution  has  a  linear  shift.  In  those 
papers,  it  is  clearly  demonstrated  that  when  it  is  assumed  that  the 
process  represented  by  the  model  is  normal,  this  linear  random  shift 
model  allows  analytical  comparisons  to  be  drawn  if  it  is  assumed  that 
the  succesive  incri^ments  in  the  princess  mean  are  drawn  independently 
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from  a  normal  population  with  mean  u  and  variance  p.  We  Intend  to 
use  the  same  approat'li  in  this  dissertation.  Two  cases  are  considered: 
IJ  unknown  and  o"  known;  and  both  y  and  o-  unknown. 

3.3.1   p  is  Unknown  and  o^  is  Known 

For  a  process  that  has  a  normal  density  function  with  unknown 
parameter  u,    Raiffa  and  Schlaifer  (1961)  show  that  the  natural  coniu- 
gate  prior  is  normal  with  parameters  m'  and  o'^/n'.  (See  Appendix  I 
for  the  details  of  their  exposition.)  From  the  prior  distribution  on 
0„  and  with  a  sequence  of  n  independent  observations  (x  ,  x„,  ...  ,  x  ) 
from  tlie  normal  process  under  consideration  [N(p,o''-)],  the  posterior 
distribution  in  period  zero  is  obtained.  If  the  sample  yields  sufficient 
statistics  m  and  n,  then  the  posterior  distribution  is  normal  with  para- 
meters n"  and  m''   jj,lven  by 

(3.3.1)  n^  =  n^  +  n, 
and 

(3.3.2)  m|J  =  (n^  m'^  +   n  m)/(n^  +  n)  . 

If  the  mean  of  the  distribution  does  not  change  from  period  to  period 
except  by  the  effect  of  the  sample  information  then  each  posterior  can 
be  thought  of  as  prior  with  respect  to  the  following  sample.  Thus,  the 
posterior  distribution  on  p   is  the  prior  distribution  on  p  ;  i.e. 

(3.3.3)  i'^    (^'(M-\y    "^/"o)  =  f?;  (Miin>[,  o'7np  , 
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where 

(3.3.4)  ra[]   =   m|  , 
and 

(3.3.5)  nJJ  =  n|  . 

In  general,  if  we  assume  that  a  fixed  sample  of  size  n  is  employed 
every  time  a  sample  is  taken  and  if  we  assume  that  the  mean  is  sta- 
tionary except  by  the  effect  of  the  sample  information,  then  in  any 
given  period  t  the  posterior  distribution  is  normal  with  parameters 
n"  and  ra"  given  by, 

(3.3.6)  n'^   =  n'  +  n  , 

and 

(3.3.7)  m"   =   (n'  m'  +  n  m)/(n'  +  n) . 

This  inferential  model  is  called  a  stationary  model  since  it  assumes 
that  neither  the  distribution  nor  the  parameters  change  from  period 
to  period.  In  this  case  it  assumes  that  \i       takes  on  the  same  value 
in  every  period  and  that  f'(y)  represents  the  information  available 
about  that  value  as  of  the  start  of  the  t-th  period. 

Suppose  now  that  the  process  generating  the  observations  un- 
dergoes a  mean  shift  between  succesive  periods.  In  particular  infer- 
ences about  the  mean  of  a  normal  process  are  considered  when  the  para- 
meter n     shifts  from  period  to  period,  with  the  shifts  governed  by  an 
independent  norma]  process.  Formally,  consider  a  data  generating  pro- 
cess that  generates  n  observations  x  ,,  x  „,  ...,  x   during  time 
*  ti   t2     '   tn 
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period  t  accordinj',  to  a  nornuil  process  wiLli  parameters  p   and  o^. 
Assume  that  the  parameter  o   is  known  antl  does  not  chanj^e  over  time, 
whereas  p   is  not  known  and  may  vary  over  time.  In  |Kirtionlar,  values 
of  the  parameter  tor  successive  time  periods  are  related  as, 

(3.3.8)    p^^^  =   p^  +  e^_^^,      t  =  1,  2,  ...  , 

where  e     is  a  normal  "random  sliock"  term  independent  of  p   with 

known  mean  u  and  variance  a^.That  is  u   behaves  as  a  random  walk. 

e         "^t 

The  mean  in  any  period  t  is  equal  to  the  mean  in  the  previous  period 
plus  an  increment  e,  wh.ich  has  a  normal  distribution,  with  known 
mean  and  variance. 

Before  the  sample  is  taken  at  time  t,  we  assume  that  a  prior 
density  function  t:ould  be  assessed  that  represents  judgment  (based 
on  past  experience,  past  information  etc.)  concerning  tlie  probabilities 
for  the  possible  values  of  p  .  If  tlie  prior  distribution  of  p   at  the 
beginning  of  time  period  t  is  represented  by  f'(p  ),  and  a  sample  of 
size  n   during  period  t  yields  x   =  (x  ,,...,  x   ),  then  the  prior 

distribution  of  u   can  be  revised.  Furthermore  at  the  end  of  time 

^t 

period  t  (the  beginning  of  time  period  t+1 ) "  the  data  generating  pro- 
cess is  governed  by  a  new  mean  p   ,,  so  it  is  necessary  to  use  the 
posterior  distribution  of  p   and  the  relation  (3.3.8)  to  determine 
tlie  prior  distribution  of  p    . 

Jn  order  to  determine  the  distribution  of  the  parameter  p  ,-, 
a  we]  L  kutivjn  tlieoic'iii  cculd  be  used.  It  says  that  tiie  convolution  g(2) 
of  tvr/o  normal  distributions  with  parameters  (p   oO  and  (p,,,(3^) 
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gives  a  distribution  which  is  normal  with  mean  (p,  +  \i^)    and  variance 
(o2  +  o|) ,  i.e.  , 


(3.3.9)       g(z)  =   f  (z|m;^  +  P,,  o2  +  op 


[see  Mood   et.    al .    (1974)].    Thus    the   distribution   of   fi      ,    is   normal, 
i.e.  , 


(3.3.10)  ^N('\+iht   +  "'    (°^/"t)    +  °e^'  -"<    f^t+1   ^'"' 

_co<     m"     +     u     <«■, 

(a^/np  +  a2  >0. 
We  could  find  a  simpler  expression  if  we  realize  that,  since  o   and 
a^    are  positive,  there  must  exist  n   such  that, 


(3.3.11)     q2  =  o2/n^  , 


or  n  =  o^/o^ 

s       e 


In  other  words,  the  disturbance  variance  is  a  multiple  of  the  pro- 
cess variance.  The  prior  distribution  of  the  mean  after  t  periods  then 
simplifies  to 

(3.3.12)     fN^\+ll"'t  +  ^'  ^^f("t  +  "s^/"t  "s^^' 
or 

where 

(3.3.14)      m'_^j^  =  m;:  +  u. 
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and 

(3.3.15)    n'  ,  =  In"  n  /(n'/  +  n  )  ]  ••  n"  . 
t+1      t   y    t     s       t 

The  inequality  stated  above  can  be  interpreted  as  showing  tliat  the 
presence  of  nonstationarity  produces  greater  uncertainty  (variance) 
at  the  start  of  period  t+l  than  would  be  present  under  stat ionarity 
because  in  the  stationary  case  n'    =  n" .  If  we  assume  that  a  change 
in  the  mean  occurs  between  every  tvro  consecutive  periods  then  we  could 
repeat  the  previous  procedure  each  time  a  change  occurs  to  determine 
the  new  prior  distribution. 

For  a  process  tliat  has  a  lognormal  density  function  as  defined 
in  (A1.14),  it  was  shown  in  Appendix  I  that,  when  the  unknovjn  parame- 
ter is  p,  the  natural  conjugate  prior  is  normal.  Thus,  the  revision 
of  the  prior  distribution  in  ;uiy  given  period  is  identical  to  the  revi- 
sion in  the  normal  case  [see  equations  (3.3.6)  and  (3.3.7)]  except  tliat  m 
is  defined  as  the  sample  mean  of  the  natural  logaritlims  of  the  observed 
X  values.  Furthermore  the  procedure  presented  before  to  represent 
changes  in  the  mean,  p,  of  the  normal  distribution  can  be  used  to  model 
changes  in  the  shift  parameter  p  of  the  logrtormal  distribution.  The 
normality  of  the  natural  conjugate  prior,  in  this  case,  allows  us  to 
use  the  formulas  ( i. 3. 8)-( 3. 3. 15)  to  study  the  behavior  of  the  prior  dis- 
tribution of  y  after  t  jjeriods  of  time. 

Since  tin-  variance  V(x)  of  the  lognormal  random  variable  x  is 
a  function  of  p  and  a''    in  the  lognormal  case,  nonstationarity  in  p 
means  that  both  the  mean  and  the  variance  of  x  are  nonstat ionary ,  so 
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that  the  lognormal  case  provides  a  generalization  of  the  normal  results, 

3.3,2   y  and  o   Both  Unknown 

The  results  of  the  previous  section  can  be  extended  to  the  case 
of  unknown  mean  and  variance.  The  joint  natural  conjugate  prior  density 
function  for  y  and   o^  is  a  normal-gamina-2  functions,  as  was  shown  in 
Appendix  I,  given  by 


(3.3.16) 

^-  -1 
/n'  exp[-  — 2(p-m')]  exp[--— ^]  [~^jz]  [-J— 


^  a  /2tt   r(d'/2) 

Given  a  prior  from  this  family  and  assuming  that  information  is 
available  from  a  normal  (or  lognormal)  process  through  a  sample  of  obser- 
vations X  ,  x^,  . . .  ,  X  ,  it  is  possible  to  obtain  a  posterior  distribution 
of  the  two  parameters  p  and  5^.  It  was  shown  in  Appendix  I  that  the  pos- 
terior distribution  is  also  normal-gamina-2,  i.e.,  f''    „  (^,6^  Im"  ,v"  ,  n"  ,d") 
where 

(3.3.17)  m"  =  (n'm'  +  n  m)/(n.'  +  n)  , 

(3.3.18)  v"  =  [d'v'  +  n'm'-  +  dv  +  nm2  -  n"m"2]/(d'  +  n)  , 
(K3.19)       n"  =  n'  +  n  , 
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and 

(3.3.20)  d"  =  d'  +  n, 

It  is  clear  from  (3.3.16)  that  the  joint  distribution  of  \i 

o 

and  a      Is  the  product  of  two  marginal  distributions,  i.e., 

(3.3.21)  r_   _^Cu,a^\m'\v",n",d")    =  f "  (p  |  o^n"  ,m")  f"(d2|v",d") 

The  marginal  density  of  5^  does  not  depend  on  jj.  Now  consider  the  case 

of  nonstationary  y  as  in  the  previous  section.  The  independence  of  the 

marginal  distribution  of  o'^    from  u   will  be  an  important  factor  in  our 

results  below. 

At  the  end  of  period  t  (the  beginning  of  time  period  t+1)  the 

posterior  distribution  of  p  and  o^  could  be  used  in  conjunction  with 

the  relation  between  p   and  the  random  shock  e     to  get  the  joint 

prior  distribution  at  the  beginning  of  period  t+1.  As  before,  the  random 

shock  model  to  be  considered  is  u    =  u  +  e    .We  make  the  assump- 

t+1     t     t+1 

tion  that  although  6   is  unknown,  it  is  known  that  e  's  variance, 

t 

a    ,    is  1/n   times  the  unknown  process  variance,  a  .  As  before,  assuming 

that  ti  has  a  posterior  distribution  with  parameters  (m'^o^/n")  and  that 

e  is  distributed  normally  with  parameters  (u,d^/n    )  it  was  shown  in 

Appendix  I  that  the  convolution  z  (z  =  u  +  e)  has  a  conditional  density 
given  by 

(3.3.22)  g(z)  =  f:;(z|m"  +  u,  a'^[il/n")    +  (l/n,.)l). 


72 

Note  that  this  density  is  conditional  on  5^,  as  is  the  conjugate 

prior  of  p.  Thus,  the  prior  density  of  P^.i»  at  the  beginning  of  period 

t+1  after  the  random  shock  has  occured,  is  given  by 

(3.3.23)     f^(Pt+ih"  +  ".  o2[(ng  +  np/n'^:  nj). 

Since  o^  is  assumed  constant,  f   ^(o^)  does  not  change  but 

Y-2 

equals  the  posterior  distribution  at  the  end  of  period  t.  Hence, the 
joint  distribution  at  the  beginning  of  period  t+1  is  given  by 

^3-3. 2M   f'   _,(0t+l.^^)  =  fN(Pt+ll-t  +  u.52[(n3+  n'-n,)])  f  '  z^^' I  ^t.  V') 


If  we  let 


(3.3.25)     m^^^  =  m;;  +  u, 


(3.3.26)     n'_^^  =  n''  n^/(:Vn''). 


(3.3.27)     d^^^  =  d'^. 


and 


(3.3.28)    v^^^  =  v'-  , 
then  the  distribution  of  jj  and  o*^  could  be  written  as 

The  revision  could  be  continued  since  the  prior  distribution 
at  the  beginning  of  period  t+1  is  still  a  normal-gamma-2  distribution. 
At  any  time  t,  the  process  mean  is  not  known  with  certainty,  but  the 
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informaL  ii)n  from  tlie  samples  collected  up  to  time  t  provides  an  indi- 
cation of  P  .  Before  the  sample  is  taken  at  time  t,  we  assuiiit'  that 
one  is  capable  of  assessing  a  prior  density  function  that  represents 
our  judgment  (based  on  past  experience,  past  information,  etc.)  con- 
cerning the  probabilities  for  the  possible  values  of  p   and  o"^.  In 
effect,  one  viev;s  (  m  ,5  )  as  a  pair  of  random  variables  to  which  we 
have  assigned  a  probability  density  function;  in  this  case  a  normal- 
gamma-2  with  parameters  m'   n',  v'  and  d'   The  sample  results  at  time 
t  can  be  described  in  terms  of  the  sufficient  statistics  m.  ,  n^^ ,  v 
and  d  ;  sample  mean,  sample  size,  sample  variance  and  degrees  of  free- 
dom needed  to  determine  v  ,  respectively.  Using  these  sample  results, 
a  new  posterior  distribution  could  be  obtained  whicli  is  normal-gamma-2 . 
The  tractability  of  the  model  is  maintained  when  a  n^itural  conjugate 
prior  is  used  and  ,i  shifl  model  of  the  form  (3.3.8)  is  assumed  for  the 
changes  of  the  parameter  M  between  tvjo  consecutive  periods.  Hence, 
after  t  periods  of  time  the  joint  distribution  of  p  and  a^  is  norma- 
gamma-2;  that  is, 

^^•^•^"^    ^^;-.-2^vi'''i">t-*-i' ";+!'  ^t+i'  <+p  ' 


where 


(3.3.31)    d^^^  =  dj  +  (t)n  , 


(3.3.32)    n'^j  =  (n*  +  n)n^/[(n^  +  n)  +  nj  , 
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(•5.3.33)   v'    =  fd'v'  +  n'm'-  +  dv  +  nm^  +  n"m"^]/[d'  +  n), 
^  '•    •      '        i  +  \        'll    tt  ttt 


and 


(3.3.3A)      m^_^^  =  (n^m^  +  nm)/(n^  +  n)  . 

In  this  manner,  a  sequence  of  prior  and  posterior  distributions  for 

successive  p  may  be  obtained  as  successive  values  of  the  random  vector 

S  =  (x,  ,  ....  X   )  are  observed. 
^    Vlt'      'at 

For  the  process  that  has  a  lognormal  density  function  as  defined 
in  (A1.14),  it  was  shown  before  that  when  both  parameters  are  unknown 
the  joint  natural  conjugate  prior  is  normal-gamma-2 ,  Tims,  Che  revision 
of  the  i^rior  distribution  in  any  given  period  is  identical  to  the  revi- 
sion in  the  normal  case.  Furthermore  the  procedure  presented  previously 
to  represent  changes  in  the  mean,  P,  of  the  normal  distribution  could 
be  used  to  model  clianges  in  the  shift  parameter  P  of  the  lognormal. 
The  fact  that  both  normal  and  lognormal  distributions  have  a  joint 
natural  conjugate  prior  which  is  normal-gamma-2  allows  us  to  use  the 
formulas  (3.3.30  -  3.3.34)  to  study  the  behavior  of  the  prior  distri- 
bution of  P  and  0-  after  t  periods, 

3.3.3  Stationary  Versus  Nonstationary  Results 

Stationary  conditions,  in  the  context  of  our  discussion,  imply 

that  tiiere  is  no  shift  in  the  mean,  M,  of  the  distribution;  that  is, 

e  =  0  and  consequt-ntly  u  and  o   are  both  zero.  Successive  values  of  V 
t  e 

are  tlie  same  acro.ss  tinn-,  i.e.,  P  =  M.-,=  ...  P  .  For  the  case  when 

1   2       t 
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only  (I  is  unknown,  thlt;  implies  that  equation  (3.3.10)  becomes, 
(3.3.35)     f'/Pt+ll'^'t  +  ^'  ("'/"P  +  "^' 


or 


(3.3.36)     fNtpj.+Jm'^',  (o2/n'')]. 

Under  stationari ty ,  then,  the  prior  distribution  of  fj^ii  at  the  start 
of  period  t+1  is  the  same  as  the  posterior  distribution  of  p   at  the 
end  of  period  t.  In  the  case  of  nonstationarity  with  no  drift,  u=0; 
in  other  words,  tlie  distribution  of  e  is  normal  with  mean  0  and  vari- 
ance o| •  For  this  case  it  is  clear  that  for  a  given  posterior  distri- 
bution of  p  at  time  t,  the  only  difference  between  the  prior  dis- 
tributions of  P^,i  under  stationarity  (see  equation  3.3.36)  and  the 
prior  distribution  of  u  under  nonstationarity  (see  equation  3.3.10) 

is  the  variance  term.  The  prior  variance  of  P  , ,  under  stationarity 


IS, 


(3.3.37)    Varg(P^^j^)  =   ''^/n^+^  =   °^/n'^; 


whereas  the  prior  variance  of  \^-,    under  nonstationarity  is, 

(3.3.38)     Var^,^(P^^,)  =  o^/n'    =  (o'/n'l)    +   (o'/n    ), 


=  a^[(l/np  +  (1/n^) 


As  expected,  the  incorporation  of  the  nonstationary  condition  has 
caused  an  increase  in  the  variance  of  the  prior  distribution.  The 
variance  increased  by  an  amount  "Vn^;  that  is,  by  an  amount  equal 
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to  the  variance  of  the  distribution  of  successive  increments  in  the 
process  mean.  For  th.e  stationary  case 

(3.3.39)        i^[+i^~^  =  n/n;;]  , 

and  for  the  nonstationary  case, 

(3.3.40)      t";^!^^  =  [(1/np  +  d/n^)]. 

Thus,  equivalent ly,  we  could  say  that  for  a  given  posterior  distri- 
bution of  p   at  time  t,  the  only  difference  between  the  prior  distri- 
bution of  u  , ,  under  stationarity  Is  that  the  term  n'  ,  is  larger 
'^t+1  ^  t+1       ^ 

with  the  stationary  condition.  When  u=|=0,  m'  is  alv^'ays  changing  and, 
therefore,  tliere  is  a  difference  in  mean  and  variance. 

Stationary  conditions,  in  the  case  when  both  jl  and  6^  are 

unknown,  imply  that  in  any  given  period  t+1  the  joint  prior  density 

for  p  and  5   is  a  normal -gamma-2  of  the  form  given  in  equations 
(3.3.30  -  3.3.3A) .  That  is, 

(3.3.41) 

'n-y-2^\.v''K.v  ^;+i'"m'^m>  -  ^N^ptVii^^+i'/-)  ^'-2^^'|^^l'^;+l>' 


where 


(3.3.42)      m' , ,  =  m" 
r  +  1    t 


(3.3.4  3) 


(3.3.44) 


V  ,  ,  =  V 
t+T     t 


u  ,  ,  =  n 
t  +  1     t 
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and 

(3.3.A5)     d'^^  =  .r  . 

Under  stationai-ity ,  then,  the  joint  prior  distribution  of  p  and  d^ 
at  the  start  of  pi-riod  t+1  is  the  same  as  the  posterior  distribution 
of  p   and  (5^  at  the  end  of  period  t.  Since  the  distribution  of  a^ 
does  not  depend  on  jj,  only  on  the  parameters  d  and  v,  we  could  model 
changes  in  p.  These  changes  in  the  mean  only  affect  the  function 
^M^fif  +  l  I'^'+l  '  o"^/"'.!)'  ^'^  equation  (3.3.'41).  In  fact,  the  effect 
of  the  nonstationarity  assumption  on  f^Cy  , -, )  is  identical  to  the 
effect  of  nonstationarity  over  the  prior  distribution  in  the  case 
when  only  y  was  the  unknown  parameter.  In  the  case  of  nonstationarity 
with  no  drift,  i.e.,  u=0,  for  a  given  posterior  distribution  of  p   and 
a      at  time  t,  the  joint  prior  density  function  for  p  and  o'^  is  similar 
to  the  stationary   viounterpart ,  as  given  in  equation  (3.3.41),  except 
for  the  fact  that  the  variance  of  f'(p  ,-,  |ni',-,5  o^/n'   )  is  larger 

than  the  variance  uf  f'(p^.i|m'   ,  g'^/n'  ,)  in  the  stationary  case. 

N   t+i   t+1       t+1 

In  other  words  6'^/n'    in  the  stationary  case  is  smaller  than  fl-^/n' 

in  the  nonstat ionary  case. 

The  nonstationarity  assumption  also  affects  the  predictive 

distribution.  For  the  case  wlien  p  is  the  unknown  parameter  and  the 

data  generating  process  is  normal,  assume  that  after  t  periods  we  have 

a  posterior  tl  i  str  i  luit  icjii  f"(u  )  wliich  is  norma]  with  mean  m"  and 

t  '  L  t 

variance  o'/n".  The  predictive  distribution  at  the  end  of  period  t 
was  shown  in  equal  ion  (Al..i2)  to  be  nnrin.il  with  moan, 


(3.3.A6)     ''-t^^t^  "  ■""  ' 
and  variance 

(3.3.47)      VarJXj.)  =  a2l(l  +  np/n;*]  =  o2[l  +  (1/np]. 

If  the  process  is  stationary  then  the  predictive  distribution  of  the 
random  variable  of  interest  at  the  beginning  of  period  t+1  is  the  same 
as  the  distribution  we  had  at  the  end  of  period  t,  i.e.,  N(m" ,  o^  [  (1+np /n|_' ] ) 
However  if  we  assume  the  nonstat ionary  condition,  the  prior  distribu- 
tion of  p  at  the  start  of  period  t+1  has  a  different  mean  and  a  dif- 
ferent variance.  Consequently  the  predictive  distribution  changes  in 
mean  and  variance  between  consecutive  time  periods.  In  other  words 
E   , (x    )  is  always  changing  depending  on  the  stochastic  change  of 
the  mean  p  , , •  In  the  case  of  nonstationarity  with  no  drift,  i.e.,  u=0, 
for  a  given  posterior  distribution  of  y   at  time  t,  the  only  differ- 
ence between  the  predictive  distribution  of  x   ^^  under  stationarity 
and  tlie  predictive  distribution  of  x    under  nonstationarity  is  the 
variance  term.  The-  variance  of  x    under  stationarity,  at  the  start 
of  time  period  t+1,  is 

(3.3.48)   Var^^^(x^_^^)  =  o2  [  (1+n;^^) /n^^^]  =  o^  [l+(  1/n;^^)  ]  . 

It  was  stated  previously  that  the  parameter  n'    is  smaller  when  y 

is  unknown  and  nonstat ionary  than  when  p  is  unknown  but  stationary. 

Hence,  as  expected,  tlie  variance  of  the  predictive  distribution, 

Var   , (x   ,),  is  larger  when  Q  is  nonstat ionary ,  This  has  some 
t+1   t+1 ' '         h  M 

implications  for  t  tie  determination  of  prediction  intervals;  which 
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we  will  discuss  in  detail  in  Chapter  Fonr.  Nonstat ionar i ty  implies 
greater  uncertainty,  which  is  reflected  by  an  increase  in  the  mea- 
sure ot  uncertainty,  variance. 

For  the  case  when  both  p  and  6   are  the  unknown  parameters 
and  the  data  generating  process  is  normal,  assume  that  after  t 
periods  we  have  a  posterior  distribution  f"(p  ,5^)  which  is 

normal -gamma- 2  with  parameters  m" ,  n",  v"  and  d".  The  predictive 
b  I  t    t    t      t      "^ 

distribution  at  the  end  of  period  t  was  shown  in  equation  (A1.33) 
to  be  Student  with  mean, 

(3.3.49)  E  (x  )  =  m"  ,        d"  >  1, 

t   t      t  t     ' 

and  variance 

(3.3.50)  Var^(x^)  =  [v'^  (n'^+1) /n^  [d;_7(d'^'  -2)],   d^,'  >  2. 

Again,  if  the  process  is  stationary  then  the  predictive  distribution 

at  the  beginning  of  period  t+1  is  the  same  as  the  distribution  that 

we  had  at  the  end  of  period  t,  i.e.,  ST  (m" , [v" (n"+] ) /n" ] [d'7  (d"  -  2)]) 

When  we  assume  the  nonstationary  condition,  the  joint  prior 

distribution  of  p  and  5"^  at  tlie  start  of  period  t+1  changes  from  its 

original  form  at  the  end  of  period  t.  The  specific  random  model  we 

are  assuming  causes  the  parameter  m  and  n  of  the  distribution  of 

p  to  change  from  the  end  of  period  t  to  the  start  of  period  t+1. 

Therefore  the  predictive  distribution  f'   (x   ,)  has  a  different 

t+1   t+1 

mean  and  variance  than  f"(x  ).  In  tlie  case  of  nonstat  ionarity  with 

It 

no  drift,  i.e.,  u=0,  for  a  given  posterior  distribution  of  p   and  6 
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at  time  t,  the  only  dillerence  between  the  predictive  distribution  of 

X    under  station;iri ty  vis-a-vis  nonstationar ity   is  the  variance  term. 

Observing  equation  (3.3.50)  closely  we  note  that  the  effect  of  nonsta- 

tionarity  is  the  same  as  in  all  previous  cases;  that  is  the  parameter 

n'    is  smaller  wlien  p  is  nonstationary  and  therefore  the  variance  is 

larger.  In  this  case  since  both  p  and  o'^   are  unknown,  at  the  end  of 

period  t  our  estimate  of  the  variance  is  v"  which  includes  ail  the 

t 

information  that  we  have  available  at  the  time  including  sample  in- 
formation. 

A  comparison  of  stationary  versus  nonstationary  results  when 
the  data  generating  process  is  lognormal  moves  along  tlie  same  lines  as 
the  normal  process  does.  For  the  case  where  the  unknown  parameter  is 
p,  the  nonstationarity  condition  causes  an  increase  in  the  variance 
and  in  the  mean  of  the  normal  prior  distribution  which  causes  an 
increase  in  the  mean  and  variance  of  the  lognormal  predictive  distri- 
bution. Similarly,  for  tlie  case  when  botli  parameters  are  unknown  the 
condition  causes  an  increase  in  mean  and  variance  in  the  prior  distri- 
bution of  p  and  a  change  in  the  joint  prior  distribution  of  p  and  o^ 
which  affects  the  logStudent  predictive  distribution.  The  logStudent 
predictive  distribution  has  infinite  mean  and  variance  which  are  not 
affected  by  the  nonstationary  condition. 

3. 4  Conclusion 
In  this  i.hapLer  we  modeled  nonstationarity  in  the  mean  of 
normal  and  iognoriual  processes  under  two  uncertainty  assumptions, 
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The  model  is  bullr  upon  the  Bayesian  analysis  of  in)rnia]  processes 
of  [\aifri  and  Schl.iiiiM'  (lOhl)  and  upon  the  analysis  of  nonstat ionnry 
means  of  normal  processes,  for  unknown  fi ,  of  Barry  (197'3).  We  extended 
the  nonstationary  results  of  Barry  (1973)  to  the  lognormal  distribu- 
tion. The  variance  of  the  lognormal  distribution  is  given  by 

(3.4.1)       Var(x)  =   w(v^-l)  e2M  , 

where  w  =  6x9(0^). 

Since  V(x)  is  a  function  of  p  and  o^  in  the  lognormal  case,  nonsta- 
tionarity  in  jj  means  that  both  mean  and  variance  of  x  are  nonsta- 
tionary, so  that  the  lognormal  case  provides  a  generalization  of  the 
normal  results.  Furthermore,  we  developed  ttie  nonstationary  model 
for  the  mean  of  normal  and  lognormal  processes  for  the  case  when  both 
parameters,  p  and  Q- ,    are  unkno^^m.  For  each  group  of  assumptions  we 
noted  that,  in  every  time  period  t,  the  uncertainty  is  never  fully 
eliminated  from  the  model. 

In  Chapter  Two  vje  emphasized  that  the  exponential  distri- 
bution was  often  used  to  represent  life  testing  models.  All  the 
researcli  in  the  area  of  life  testing  vjhere  this  distribution  has 
been  used  has  assumed  stationary  conditions  for  the  parameters  of 
the  model  and  for  the  model  itself.  Appendix  II  shows  the  Bayesian 
modeling  of  nonstat ionarity  for  the  parameters  of  an  exponential  dis- 
tribution using  random  shock  models.  On]y  under  very  trivial  as- 
sumptions does  the  analysis  yield  tractal)le  and  consequently  useful 
results.  On  the  other  hand,  as  vjas  shown  in  this  chapter,  the  normal 
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and  lognormal  distributions  provide  results  that  are  especially 
tractable. 

In  any  given  period  t,  the  prior,  posterior  and  predictive 
distributions  depend  on  the  parameters,  m   and  n  when  only  y  is 
unknown;  and  on  the  parameters  m  ,  n  ,  v   and  d  when  both  y  and 
a^  are  unknown.  Under  the  nonstationarity  conditions,  these  para- 
meters change  from  period  to  period  not  only  because  new  information 
becomes  available  through  the  sample,  but  because  of  tlie  additional 
uncertainty  involving  tlie  shifts  in  the  parameter  y.  To  make  better 
use  of  these  distributions  the  decision  maker  must  know  how  they  are 
evolving  through  time.  Management  requires  realistic  and  accurate 
information  to  aid  in  decision  making.  For  instance  the  decision 
maker  can  be  interested  in  knowing  how  the  variance  of  the  distri- 
bution of  the  mean,  y,  changes  across  time.  Furthermore,  since  one 
of  the  objectives  of  the  user  of  the  distribution  is  to  construct 
prediction  intervals  for  the  process  variable  he  can  be  interested 
in  knovv)ing  how  the  variance  of  the  predictive  distribution  behaves 
as  the  number  of  observed  periods  increases.  We  will  address  this 
problem  in  detail  in  Cliapter  Four  through  the  study  of  the  limiting 

behavior  of  the  parameters  m  ,  n  ,  v   and  d  .  In  addition,  attention 
^  t    t'   t      t 

will  be  focused  on  the  methods  of  constructing  prediction  intervals 
for  tlie  normal.  Student,  lognormal  and  logStudent   distributions 
under  various  uniertaiaty  conditions. 


CHAPTER  FOUR 

LIMITING  RESULTS  AND  PREDICTION  INTERVALS  FOR  NONSTATIONARY 
NORMAL  AND  LOGNORMAL  PROCESSES 
4 . 1  Introduction 
In  Chapter  Three  we  emphasized  that  for  many  real  world  data 
generating  processes  the  assumption  of  stationarity  is  questionable  and 
stochastic  parameter  variation  seems  to  be  a  reasonable  assumption.  If 
a  data  generating  process  characterized  by  some  parameter  is  nonstation- 
ary,  then  it  is  potentially  misleading  to  make  inferences  and  decisions 
concerning  the  parameter  as  if  it  only  took  on  a  single  value.  We  should 
be  concerned  with  a  sequence  of  values  of  the  parameter  corresponding  to 
different  time  periods.  It  was  shown  in  Chapter  Three  that  if  we  use  a 
particular  stochastic  model  we  can  model  nonstationarity  for  the  shift 
parameter  of  normal  and  lognormal  processes  from  a  Bayesian  viewpoint, 
under  two  uncertainty  conditions,  and  that  we  can  obtain  tractable 
results.  In  particular,  values  of  the  parameter  for  successive  time 
periods  are  assumed  to  be  related  as 

(^•1-1)       Pt+l  ^   Pt  ^   ^+1'       t  =  1,  2,  ...  , 

where  e  , ^  is  a  normal  "random  shock"  term  independent  of  y   with  known 
t+1  ^  t 

mean  u  and  variance  o^ .  The  mean  in  any  period  t  is  equal  to  the  mean 
in  the  previous  period  plus  an  increment  e,  which  has  a  normal  distri- 
bution, with  known  mean. 

Comparing  tlie  stationary  with  the  nonstat ionary  prcjcesses  we 

pointed  out  tliat  when  tlie  data  generating  process  is  normal  or  log- 

H3 
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normal  and  the  unknown  parameter  is  y,  the  nonstationary  condition 
causes  in  any  given  period  t  an  increase  in  the  variance  of  the  nor- 
mal prior  distribution.  This  causes  an  increase  in  the  mean  of  the  nor- 
mal predictive  distribution  for  normal  processes  and  causes  an  increase 
in  the  mean  and  variance  of  the  lognormal  predictive  distribution  for 
lognormal  processes.  Wlien  both  parameters,  y  and  q^  ,  are  unkno^^m  a 
similar  result  is  found  for  the  prior  and  predictive  distributions  of 
the  normal  and  lognormal  data  generating  processes. 

The  results  discussed  in  Chapter  Three  Viave  to  do  with  the 
period  to  period  effects  of  random  parameter  variation  upon  the  prior 
and  predictive  distributions.  However,  the  asymptotic  behavior  of  the 
model  has  important  implications  for  the  decision  maker.  For  instance, 
when  only  y  is  the  unknown  parameter,  under  constant  parameters  uncer- 
tainty about  y  eventually  is  eliminated  since  nl  Increases  without 
bound  and  the  sequence  of  prior  variances  (a^/n!)  converges  to  zero. 
Hence  the  distribution  of  y   eventually  will  be  unaffected  by  further 
samples.  On  the  other  hand,  shifting  parameters  could  increase  the  uncer- 
tainty under  which  a  decision  must  be  made  since  it  reduces  the  infor- 
mation content  that  past  samples  offer  for  the  actual  situation.  Increases 
in  uncertainty,  caused  by  stochastic  parameter  variation,  have  important 
implications  for  the  decision  maker  since  his  decisions  depend  upon 
the  uncertainty  under  which  they  are  made.  Similarly,  random  parameter 
variation  produces  important  differences  in  the  limiting  beliavior  of 
the  prior  and  predictive  distributions  wlien  y  and  o^  are  the  unknovxm 
parameters.  In  Section  4.2  we  studv  the  limiting  behavior  of  the  param- 


85 

eters  m' ,  v'   n'   and  d'  of  the  prior  and  predictive  distributions  for 
t    L    t        t 

the  normal  and  lognormal.  data  generating  processes.  In  addition  we  dis- 
cuss the  implications  of  these  limiting  results  for  the  inferences  and 
decisions  based  on  the  posterior  and  predictive  distributions. 

In  any  period  t,  all  the  information  contained  in  the  initial 
prior  distribution  and  in  subsequent  samples  is  fully  reflected  in  the 
posterior  and  the  predictive  distributions.  In  some  applications,  partial 
summaries  of  the  information  are  of  special  importance.  One  important 
way  to  partially  sunmiarize  the  information  contained  in  the  posterior 
distribution  is  to  quote  one  or  more  intervals  whicli  contain  a  stated 
amount  of  probability.  Often  the  problem  itself  will  dictate  certain 
limits  whicli  are  of  special  interest.  A  rather  different  situation 
occurs  when  there  are  no  limits  of  special  interest,  but  an  interval 
is  needed  to  show  a  range  over  which  "most  of  the  probability  lies". 

One  objective  of  this  thesis  is  to  develop  Bayesian  prediction 
intervals  for  future  observations  that  come  from  normal  and  lognormal 
data  generating  processes.  In  particular,  we  are  interested  in  most  plau- 
sible Bayesian  prediction  intervals  of  cover  3  as  were  defined  in  Section 
2.2.  In  Section  4.3  we   discuss  the  problem  of  constructing  prediction 
intervals  for  normal,  Student,  lognormal  and  logStudent  distributions. 
It  is  pointed  out  that  it  is  easy  to  construct  these  intervals  for  the 
normal  and  Student  distributions  but  that  it  is  rather  difficult  for 
the  lognormal  and  logStudent  distributions.  An  algorithm  is  presented 
to  compute  the  Bayesian  i)rediction  intervals  for  the  lognormal  and  log- 
Student  distribuLions.  In  addition,  we  discuss  the  relationship  that 
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exists  between  Bayesian  prediction  intervals  under  nonstationarity 
and  classical  certainty  equivalent  and  Bayesian  stationary  intervals. 

4 . 2  Special  Properties  and  Limiting  Results 
Under  Nonstationarity 

4.2.1  Limiting  Behavior  of  m'  and  n'  When  \i   is  the  Only  Unknown  Parameter 

For  a  process  that  has  a  normal  density  function  with  unknown 

parameter  p,  Raiffa  and  Schlaifer  (1961)  show  that  the  natural  conjugate 

prior  distribution  is  normal  with  parameters  m'  and  o^/n'.  In  Section 

3.3  we  pointed  out  that  if  the  mean,  y,  of  the  data  generating  process 

does  not  change  from  period  to  period  except  by  the  effect  of  the  sample 

information,  then  each  posterior  can  be  thought  of  as  a  prior  with 

respect  to  a  subsequent  sample.  In  general,  if  we  assume  that  a  sample 

of  size  n   is  employed  every  time  a  sample  is  taken  [which  yields  a 

n 
statistic  m  =  (  Z  x  ./n)]  and  if  we  assume  that  the  mean  y  is  sta- 

i=l   ^^ 
tionary  then  in  any  given  period  t  the  posterior  distribution  of  y 

is  normal  with  parameters  n"  and  m"  given  by 

(4.2.1)  n"  =  n'  +  n 

t     t     t 

and 

(4.2.2)  m'^'  =  (n^  m'  +  n^  mj.)/(n'  +  n^)  . 

In  order  to  study  the  limiting  values  of  n'  and  m'  under  sta- 

t      t 

tionary  conditions,  we  have  to  characterize  the  posterior  and  predictive 
tlistributions  after  t  periods  of  time  have  elapsed.  Since  the  limiting 
results  under  nonstat ionary  means  will  be  based  on  a  fixed  sample  size 
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each  period,  we  will  make  the  same  assumption  for  the  stationary  lim- 
iting results,  that  is  n  =  n,  Vt.  In  period  one,  for  a  process  that  has 
a  normal  density  function  with  unknown  parameter  y ,  i .e.,  f  (x  | p) , 
the  natural  conjugate  prior  is  normal  with  mean  m'   and  variance  a^/n' , 
i.  e.,  f  (p,  |m' ,a'^/n ' )  .  If  a  sample  of  size  n  from  a  normal  process  yields 
the  sufficient  statistics  m   and  n,  then  the  posterior  and  predictive 
distributions  at  the  end  of  period  one  are  given  by 

(4.2.3)   f|^  [pj(n;mj  +  nmj)/(n|+  n),a2/(n|+  n)  ]  =  f  j^(p  Jm!||,o2/n'p 


or 


=  fj;j(M2|m',a2/n')  , 
and 

(4.2.4)   fj^(x-^|m^,  a2(l  +  n!^)/n!j;)  , 

respectively. 

In  period  two,  if  a  sample  is  taken  from  a  normal  process  that 
yields  the  sufficient  statistics  m„  and  n  then  the  posterior  and  predic- 
tive distributions  at  the  end  of  the  period  are  given  by, 

(4.2.5) 

f;;    [fi2l[n{ni{   +  n(mj+  m^)  ]  /  in[+   2n)  .    a2/(n|  +   2n)]    =    f;;(p2|m2,    ^^/^'^ 


or 

and 


=    f^(M2|m^,    o^/n^) , 


(4.2.6)      f^^(xjm|;,    o2(l   +  np/n'p 
respectively. 
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After  t  samples  are  taken  the  posterior  and  predictive  dis- 
tributions are  given  by 


(4.2.7)    f';  (il    |m",a2/n") 
N     t'  t      t 


and 


(4.2.8)  f^  (xjm'^,a2(l  +  n'p/n'p      , 

where 

t 

(4.2.9)  m'^  =  Cn|  in!  +  n  Z  m.)/(n:  +  t  n) 

i=l  ^ 
and 

(4.2.10)  n|.'  =  n|  +  t  n   . 

We  pointed  out  in  Chapter  Three  that  if  the  data  generating  process 

is  lognormal  with  unkno\,m  parameter  \a  ,    then  the  natural  conjugate 

prior  is  normal  and  the  predictive  distribution  is  lognormal.  For 

this  case  after  t  samples  are  taken  that  yield  sufficient  statistics 

n 
(ni^,n),  (m^.n),  ...  (m^  ,n) ,  (where  m^.  =  [  Z      lnx^_.]/n),  the  posterior 

i=l 
and  predictive  distributions  are  normal  and  lognormal,  respectively, 

with  parameters  m'^  and  n"  as  defined  in  (4.2.7  -  4.2.10). 

The  mean  and  variance  of  the  predictive  distribution  when 

the  data  generating  process  is  normal  are  given  by 

(4.2.11)  E(x^)   =  ra" 
and 

(4.2.12)  V(x^)   -  a^in'^  +   l)/n"   . 

On  the  other  hand,  the  mean  and  variance  of  the  predictive  distribu-  ■ 
tion   for  the  lognormal  process  are  given  by 
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(4.2.13)  E(x  )  =  exp[m"  +  o-(l  +  n")/2n"] 

and 

(4.2.14)  Var(x  )  =  exp  (m")  /w(w-l)  , 

where  w  =  exp[a^(l+  n")/n"]  . 

The  mean  and  variance  of  the  posterior  distribution  of  y 
for  the  normal  and  lognormal  cases  are  given  by 

(4.2.15)  E(p^)  =  m'^ 
and 

(4.2.16)  Var(p^)  =  a'^/n'^    . 

Since  n  is  a  positive  integer  for  all  t,  n'  (  =  n'  +  (t-l)n)  increases 
without  bound  as  t  increases,  so  that  the  variance  of  the  posterior  dis- 
tribution of p  for  the  normal  and  lognormal  cases  approaches  zero  as 
t  increases.  Intuitively,  in  the  stationary  case,  the  distribution  of 
the  unknown  parameter  becomes  tighter  as  more  information  is  obtained. 
As  expected,  when  the  data  generating  process  is  normal  the  variance  of 
the  predictive  distribution  approaches  the  process  variance,  a'^>  as 
t  increases,  i.e.. 


(4.2.17)    Jim  {a2(l  +  np /np  =   lim  {(a^/n^  +  o2}  =  a^    , 


since  the  uncertainty  about  p  is  eliminated  as  t  approaches  infinity. 

In  any  given  period  t,  m"  is  a  weighted  average  of  the  prior 
mean  at  period  one,  m! ,  and  of  all  past  sample  means,  m  ,  m  ,  ...  ,  m  . 
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All  sample  means  up  to  period  t  are  given  the  same  weight,  n,  in  the 
determination  of  the  posterior  mean,  m" ;  in  other  words  recent  obser- 
vations receive  the  same  weight  as  not-so-recent  ones.  Moreover,  in 
any  period  t,  the  prior  information  contained  in  the  parameter  m'  has 
a  weight   n'/(n'  +  tn) ,  which  decreases  as  t  increases.  The  variance 
of  the  predictive  distribution  for  the  lognormal  case  depends  on  the 
parameters  n"  and  m"  [see  (4.2.14)].  For  t  very  large,  the  term  /w(w-l) 
approaches  a  constant  since  w  =  [o^(l  +  n")/n"]  approaches  exp(o^) .  As 
t  increases  the  changes  in  the  predictive  variance  are  produced  solely 


by  changes  in  m"  since  /w(w-l)  is  convergent.  The  mean  of  the  predictive 
distributions  for  the  lognormal  case  also  depends  on  n"  and  m",  [see 
(4.2.13)].  Since  the  variance  a'   approaches  zero  as  t  increases,  the 
posterior  mean  m"  approaches  the  unknown  population  mean  P  of  In  x  . 
That  is 

(4.2.18)     E(x  )  -^  exp[y  +  {a'^H)]. 

Suppose  now  we  assume  as  in  Chapter  Three  that  the  process 
generating  the  observations  undergoes  a  mean  shift  between  successive 
periods.  In  particular,  values  of  the  parameter  for  successive  time 
periods  are  related  as 


(4.2.19)      P^+l  "   ""t  "^   ^t+1'    e  '^  N(u,[o2/n^]) 


We  pointed  out  in  Section  3.3  that  the  prior  distribution  of  V    in 
any  given  period  t+1  is  given  by. 


(^-2.20)      f'0',^J-;^i.  o2/n;^^). 
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where 


(A. 2. 21)      m' ,,  =  ra"  +  u 
t+1     t 

and 

(4.2.22)  n;^^=  [n-'n^/(n''+n^)l  <n;: 

Winkler  and  Barry  (1973)  present  a  study  of  the  limiting  behavior 
of  n'  in  the  case  of  a  nonstationary  normal  process  with  unknown  mean  )j 
when  a^   is  known.  They  show  that 

(4.2.23)  11m  n'  =  n  =  (n/2)  {[1  +  (4n  /n)]^^^-   1}  . 

L        1^  S 

t-H»J 

For  this  result,  it  was  assumed  that  each  period  the  sample  size  is 

the  same,  i.e.,  n  =  n  Vt .  It  is  possible  to  contrast  this  limiting  result 

with  that  of  the  stationary  case.  The  stationary  case  can  be  thought  of 

as  a  limiting  form  of  the  nonstationary  case  with  u=0  and  n  =  m.  Since 

e  %  N(u,a2/n  ),  if  n  =  «,  and  u=0  then  e  '\.  N(0,0).  In  other  words  e^  =  0  Vt. 
s       s  t 

In  the  nonstationary  case  nV^  <  n"  =  n'  +  n.  This  inequality  can  be 

^  t+1     t     t  -^  J 

interpreted  as  showing  that  the  presence  of  nonstationarity  produces 
greater  uncertainty  (variance)  at  the  start  of  period  t+1  than  would  be 
present  under  stationarity ,  since  in  the  stationary  case  n'    =  n"  . 
Because  of  the  additional  uncertainty  involving  the  shifts  in  the  para- 
meter p,  tlie  distribution  of  p  does  not  necessarily  become  tighter  as 
t  increases  (as  it  does  under  stationary  conditions).  In  fact,  if  n' 
the  initial  value  of  n',  is  larger  than  the  limiting  value  of  n  ,  the 
variance  of  the  posterior  distribution  of  y  increases  as  t  increases. 
In  this  case  initially  there  is  a  great  deal  of  information  concerning 
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p  ,  Even  though  the  observations  in  the  first  period  yield  yet  further 

information  concerning  y  ,  the  random  shock  at  the  end  of  the  period  is 

strong  enough  to  imply  that  there  is  less  information  about  y^  ^^  the 

beginning  of  the  second  period  than  there  was  about  p   at  the  bengin- 

ning  of  the  first  period.  On  the  other  hand,  if  n|  is  less  than  n^,  then 

the  information  obtained  each  period  "overrides"  the  uncertainty  caused 

by  the  random  shock,  in  a  sense,  and  there  is  more  information  about  ^2 

at  the  beginning  of  the  second  period  than  there  was  about  y   at  the 

beginning  of  the  first  period. 

To  investigate  the  behavior  of  the  sequence  (m')  assume  as 
before  that  the  sample  size  in  each  period  is  n,  In  addition,  to  obtain  a 
simpler  expression  for  comparisons  with  the  stationary  case,  assume  that 
1)  the  mean  of  the  distribution  of  the  random  shock  is  zero,  i.e.,  u=0 
and  2)  at  the  beginning  of  the  first  period  the  model  is  already  in  steady 
state  form  in  the  sense  that  n'  =  n  ,  so  that  the  sequence  of  variances 
(o^/n')  will  be  a  constant  sequence  (once  the  process  reaches  the  limit 
n   it  remains  there). 

Based  on  the  assumptions,  from  (4.2.21)  and  (A,  2. 2),  m'^^  can 
be  expressed  in  the  form 


t+1 
1 


(4.2,24)      m^^^  =  qm'  +  (l-q)m^. 


rhe  result  can  be  motivated  as  follows:  it  is  assumed  that 
n  =n,  and  nl=  n  which  Implies  n'=  n  ;  tlierefore  it  follows  that  the 
posterior  mean  can  be  expressed  By 


■^'c+l"  ""t  "  ^"t"'t"^  "t'"t^''^"t"^  "t^  "  ^"h^'t"'"  ""'t^^^^h'^  "■• 

=  (nj /(n^+  n)lm^  +  In 

Defining  q  as  in  (4.2.25)  is  follows  that  m', ,  =  qm'  +  (l-q)m  , 


=  I", /("l"^  n)lm^  +  ln/(n^+  n)ra^. 
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where 

(4.2.25)  q  =  iij^/Cnj  +  n). 

Note  tluit  0  <:  q  <  1 ,  When  we  successively  apply  (4,2,24),  m'    becomes 
a  function  of  ni'  (the  initial  mean),  m.  (the  sample  means)  and  of  q. 
The  prior  mean  of  the  unknown  parameter  p,  after  t  periods  of  time 
have  elapsed  can  be  written  as 

(4.2.26)  ,       t   ,  ^  ,,   ,  ^"^   i 

""t+l  =  'I   '"t  "•"  ^^""^^   ^  "^     ™t-i  • 

i=0 

Unlike  the  stationary  case,  the  sequence  (m')  does  not  have 
a  limit.  Tlie  prior  mean  at  the  beginning  of  any  period,  under  nonsta- 
tionarity,  can  be  expressed  as  the  sum  of  the  initial  mean,  m',  dis- 
counted by  a  factor  q   and  an  exponentially  weighted  sum  of  the  observed 

t  2     1 

sample  means.  Since  q  is  a  constant  less  than  one,  q   <  ,,,  <  q   <  q   , 

Thus  as  we  move  into  the  future  the  initial  prior  mean  has  less  weight 
in  the  determination  of  the  prior  mean  m'.  From  the  exponentially 
weighted  sum  of  sample  means  we  note  that  recent  observations  are 
weighted  more  heavily  than  not  so  recent  ones.  The  impact  of  a  particular 
sample  mean  on  future  values  of  the  prior  distribution  of  \i   decreases 
as  t  increases. 

Under  the  same  assumptions  that  we  used  to  present  the  limiting 
results  of  n'  and  m ' ,  the  mean  and  variance  of  the  normal  predictive  dis- 
tribution when  the  data  generating  process  is  normal  are  given  by 


(4.2.27)       E(x^)  =  m'^'  [as  defined  in  (4.2.26)], 
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and 


(4.2.28)     Var(x  )  =  (72(1  +  n  ) /n 


respectively.  Similarly  when  the  data  generating  process  is  lognormal 
the  mean  and  variance  of  the  lognormal  predictive  distributions  are 
given  by 

(4.2.29)  E(X|.)  =  exp  [m'^.'  +  0^(1  +  n^)2n^], 
and 

(4.2.30)  Var(x  )  =  exp(m")  /^"(w^^  , 

where  w  =  exp  [a^Cl  +  n  ) /n  ] . 

'-■        L 

The  additional  uncertainty  involving  the  shifts  in  the  parameter  p 

affects  the  predictive  distribution  of  the  random  variable  depending 

on  how  the  initial  parameter  n'  relates  to  the  limiting  value  n  .  If 

1  L" 

n'  is  larger  than  n  ,  the  variance  of  the  predictive  distribution  for 
1  ^ 

normal  processes,  0^(1  +  n'')/n",  increases  as  t  increases.  Again  there 
is  initially  a  great  amount  of  information  concerning  x.  The  informa- 
tion obtained  each  period  from  the  sample  ia  not  strong  enough  to  over- 
ride the  uncertainty  caused  by  the  random  shock.  There  is  not  a  similar 
effect  in  the  variance  of  the  predictive  distribution  for  lognormal 
processes  since  it  depends  on  both  parameters  m"  and  n" .  The  expected 
value  of  the  predictive  distribution  for  normal  cases  does  not  have  a 
bound.  It  is  influenced  heavily  by  the  most  recent  sample  means.  The 
expected  value  of  the  predictive  distribution  for  lognormal  cases  also 

depends  on  both  parameters  m"  and  n" . 
K  t       t 
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A. 2. 2  Limiting  Behavior  of  ra'   n'   v'  and  d'  When  Both  Parameters  \i 

and  o^  Are  U nkno wn 

The  most  involved  of  the  normal  or  lognormal  cases  is,  quite 
naturally,  that  in  which  neither  p  nor  o^  is  known.  It  is  clear  that 
we  shall  have  to  assign  (fi,d^)  a  bivariate  prior  density  function.  The 
natural  conjugate  prior  density  function  of  (p,a^)  is  the  normal-gamma-2 
with  parameters  m',  v',  n',  d'.  If  the  mean  and  variance  of  the  data 
generating  process  are  stationary  and  sample  information  arrives  each 
period  then  each  posterior  can  be  thought  of  as  a  prior  with  respect 
to  the  following  sample.  In  general  if  we  assume  that  a  sample  of  size 
nj.  is  employed  every  time  a  sample  is  taken,  and  the  sample  yields  suf- 
ficient statistics  m  ,  n  ,  v   and  d  ,  and  if  we  assume  that  the  para- 
meters do  not  change  then  in  any  given  period  t  the  bivariate  distri- 
bution of  i\i,a^)    is  normal-gamma-2  with  parameters  iii" ,  n" ,  v"  and  d" 
given  by 

(4.2.31)  m'^  =  (n-  m'  +  n^m^)/(n^  +  n^)  , 

(4.2.32)  n'j!  =  (n-  +  n^)  , 


and 


(4.2.23)      v"  =  (d'v'  +  n'm'-  +  d  v   +  n  m^-n"m"2) / (d '  +  n  ), 
t    'tt     tt      tt     ttttt     t 


(4.2.34)      d'^   =  d^  +  n^.. 


To  study  tlie  limiting  behavior  of  m'   v' ,  n'  and  d'  under  sta- 
tionary and  nonstat  ionary  conditions  we  will  make  the  assumptions  tliat 
n  =n  Vt  and  that  d  =  d  Vt .  After  t  samples  are  taken  the  sufficient  sta- 
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tistics 


(m  ,  V  ,  n,  d) ,  (m  ,  v  ,  n,  d)  ...  (m  ,  v^,  n,  d)  are  available. 


The  characLerization  and  consequently  the  limiting  behavior  of  m'  and 
n'  are  identical  to  the  ones  presented  for  the  case  when  p  is  the  only 
unknown  parameter,  [see  equations  (4.2.9)  and  (4.2.10)  for  the  stationary 
conditions  and  (4.2.23)  and  (4.2.26)  for  nonstationary  conditions]. 
The  characterization  of  the  parameter  d'  is  rather  simple.  Under 
stationarity  and  nonstationarity  the  parameter  d'    is  equal  to  the 
parameter  d".  After  t  periods  of  time  the  following  relation  holds, 

(4.2.35)      d"  =  d'  +  tn. 
t    t 

The  limiting  value  of  the  parameter  d"  approaches  infinity  as  t  ap- 
proaches infinity. 

The  characterization  of  the  parameter  v"  is  more  involved. 
Before  considering  the  characterization  of  v"  a  transformation  of 
the  original  expression  is  to  be  presented.  Expression  (4.2.33)  could 

be  rewritten  as 

(n'm'  +  nm  )2 
d'v'  +  dv   n'm'2  +  nm^-  (n'+  n)  [ — "^  /  ,  , Jy] 


(4.2. 3fa) 

'        'I  .         ^t 

(n'2m'2+n^m2+2n'nm'm  ) 

°^  1  .->  _L.    2    r   t   t      t    t   t  t' 

.,  ,  ,  .    n  m^  +  nm-^  -  I — ■ ; — — - — ■ 1 

d'v'+dv  tt      t           n'+n        ^ 

(4.2.37)   v"  =   "^  "^  ■                        ^ 


t       d"  d" 

t  t 


Combining  terms  and  simplifying  (4.2.37)  becomes 
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n'n    (ni'"^   +  m^    -   2m 'm   ) 


II,,  r     t  t  t  t    t 


d'v'    +  dv  [- — p--— — -;^-^-] 

(4.2.38)      v"=-^-t_  +  "t  -^  " 

^  '^t  d" 

t 


or 


nn'(m'    -  m   )2 

(4.2.39)      v"    =    [d'v'    +  dv   +     —5 ^f-r----    1    /    (d'    +  n) 

ttt  n'+nt 


It  can  be  noted  that  given  o^ , 

(n!  +  n) 


(4.2,40)   V(m  )  =  E  m   -  m']-  =  — —r- 
t      '  t     t         n'n 

t 


o2  ; 


so  t  ha  t 


n'n  (m   -  m ' ) ^ 

(4.2.41)  E  [  ^  n'%~n^~^  =  °^' 

[see  Raiffa  and  Schlaifer  (1961)].  Thus  assuming  that  v'  and  v  are 

obtained  as  unbiased  estimators  of  o^,  unbiasedness  In  v"  Is  preserved 

'  t     ' 

by  the  Inclusion  of  the  third  term  in  the  numerator  of  v".  v'  will 

t    t 

only  be  unbiased  if  it  was  based  on  a  noninformative  prior  at  time 

t=0.  Otherwise  it  is  biased  by  prior  information. 

Now  consider  the  characterization  of  v"  as  defined  in  (4.2.39) 

t 

In  period  one  the  posterior  value  of  v  is  given  by 

d'v'      dv         n'n  (m   -  iii ' ) 

(4.2.42)  vV  =  ,4-7 —  +  TV~-     + 


1    d'  +  n   d'  +  n      (n'+  n)  (d|+  n)  ' 


in  period  two  the  (josterior  value  of  v  is  given  by 
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d'vj  dv 


n'n  (m^  -  mj)'-     n'n  (m„  -  m') 
"^  (n|+  n)(d]+  n)  ""      (n^  +  n)  '    ^""^ 


In  period  three  tlie  posterior  value  of  v  is  given  by 

(4.2.44) 

d'v'  dv  dv^  dv 

"^3  "  7dM^ny(d]+2^(Trp-Ti7)"  ''"  (d  '+n)  (d  ' +2T)yTd''T3rO  "^ ("d  ' +27^)Td' +170  '^(d"'!-!'^^  ^ 


n'n  (m,-  m')^  n'n  (m,  -  m')^       n'n(m  -m')^ 

, 1     1 1 ,      I  I  Z  J    J   J      }  ( 

(n|+n)(d|+a)(d|+2n)(d|+3n)  ^    (n'+n)  (d|+2n)  (d|+3n)    (n^+n)  (d]^+3n) 


In  any  given  period  t  the  value  of  v"  depends  on  the  station- 
arity  condition.  Let  v"  be  the  sum  of  two  terms  (a)  and  (b)  as  defined 
in  (4.2.43)  and  (4.2.44).  Term  (a)  does  not  include  parameters  that 
depend  on  the  nonstationarity  assumption  but  term  (b)  does.  It  has 
been  pointed  out  many  times  before  that  n'  is  affected  by  the  nonsta- 
tionarity assumption. 

To  study  the  limiting  behavior  of  v"  under  conditions  of 
nonstationarity  it  is  assumed  as  in  a  previous  example  that  n'=  n 

and  consequently  n'  =  n   Vt.  Define  P  =  n  n/(n  +  n) .  Based  on  the 

'      ^   t     L  L     L 

assumptions  and  the  definition  of  P,  expressions  (4.2.42)  and 
(4.2.43)  will  be  written  as 
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and 


<i\v\  dv      P(m-m,')2 

^  '"  ^       1     (d|+n)  ^  (dj+n)  ^    (djfn) 


djvj  d  dv^ 

(4,  2.  A  6)      v|;  =  7q-+-)-(^rf2ny  "^  '(d]+nT(d|+2T7)"  "^  Td^+YnT 


"*"  (dj'+n)(d^'+2n)"  "^  '  (d|+Zn"j~ 


In  general  v"  could  be  written 


d'v'           t  V. 

(4.2.47)  ^'l   =  -—'- +  d[  Z  (  ^_^   ^    )] 

n(d|+in)       ^"■'-  !I  (d^+jn) 

i=l  j=i 


t   Im.-  m:]2 

+  p[   s  ("T-r""^ — ^^ 

^"^    n  (d'+jn) 
j  =  l 


We  want  to  show  that  the  expression  (4.2.47)  has  a  limit.  We  were  not 
able  to  simplify  expression  (4.2.47)  to  find  the  limiting  value  of 
v''  directly.  However  we  can  show  that 

(4.2.48)      lim  v"  =  a^    w.p.l. 

].  We  want  to  show  that  tliere  is  a  sequence  fv,  ,  },  of 
estimates  of  the  unknown  variance,  which  converges  to  o^  w.p.l. 
Suppose  that  at  a  given  point  in  time  t  there  is  available  a  prior 
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distribution  of  a^  : 

v'd' 


232   v'd'    d'jv^ 

(4.2.49)     f   ,,(l/a21v',d')  =   ^      ^&^  J  I  2   J 

y_/     I  — —^ 


r(d'/2) 


d'v'      ^  _  1 


(4.2.50)  <x   e   ^'^   (1/62)^ 


and  suppose  that  there  are  available  t  sample  variances  of  size  n, 
(v  ,  ..,  ,  V  ),  The  likelihood  of  this  vector  of  sample  variances  is 

t 

(4.2.51)  f(v  |o2)  =   n   f       [v.|n-l,  o2] 

^  i=l  x2(n-l)   ^ 

/   i\    n-1    ^  (n-l)v. 

t         (n-l)v    -r-   -   1  r  In 

(4.2.52)  a:   n   [( 2~)  ^     ][  e   ^   2a2   J] 

i=l      ° 


The  posterior  distribution  of  a^    could  be  written  as 

(4.2.53) 

v'd'       d'    ^     (n-1)  I  . 

f  (d2)  f(v^|a2)  c  e   ^^    (1/02)"^  e   ^'^    i=l  Nl/o^)  ^  l~2~  "  ^^ 


(4.2.54)  t 


v'd 
,    ._,  .  .  tn-t 

2^2     i=l  ._  ,  y. 

«  e  (l/o2) 


(n-1)    E  v.-  — -^       d'    T  ^  ^.-  ^ 


(4.2.55)  ^ 

1   r  /   ,  X   „     ,,!,■,        tn  -  t  +  d 


J  [(n-1)   5:   V.+  d'v']        --'i—^i— "-  _  1  -  t 


o^  e   ^''^        i=l   ^        (l/o2) 


101 
Define 


and 


(4.2.56)      d"  =  d'  +  t(n-l) 


(4.2.57)     v"   =   E   (n-l)v.  +  d'v7[d'  +  t(n-l)l 
^    '  i=l       ^ 


The  posterior  distribution  of  5^  could  be  rewritten  as 


v';  ,d"  d" 

(t)  t     ,,       .„   t 


(4.2,58)    f"(a2|v"  ,,d") 


-T^^—  V'  -d"  -„-  -  1  d"  v';  , 
20-^     r_(t)  1]^  r  t   (t) 

tTP   '  ^     2 


(t)'  t'  r(d'72) 


Now  lets  look  at  the  limiting  behavior  of  v'l  , 


t  (n-l)v.  +  d'v' 

lira  v'!  X  =   lini    E  — rv — ; ; :-,- 

(t)  .  ,      d'  +  t(n-l) 

t->«>  tx"   1=1 


t    (n-l)v. 

lira   E         ^ 


.  .    t(n-l) 


From  sampling  theory  it  is  known  that  if  the  sample  variance 

t 

is  defined  to  be  v  =  l      (x  .-  m  )2/(n-l)  then  E(v  lu  ,0"^)  =  o^  and 

t   .  ,    ti    t  t'  t 

1=1 

V(v  111  ,0^)  =  2a'V(n-]).  Assuming  that  v   ..,  v   are  i.i.d.  then 

t  t  ,  - 

E(  T.      v./t)  =  o-  and  V(   E   v./t)  =  Var(v.)/t2  =  [2o'+/t(n-l)  ->  0. 

i=l   ^  i=l   ^  ^     .  t-^ 

Therefore  1  im  v  =  o"^  w.p.l. 
t       ^ 
t  '^''^ 


]02 


lim  v"    =   lini    I      v./t 
(t)  .  ,   1 

t-J-co  t->'^n   1=1 


=   o^   w. p. 1  . 


We  have  shown  that  there  is  a  sequence  {v,  , )  as  defined  in  (4.2.57) 
which  converges  to  o^.  Moreover  the  Bayesian  can  observe  (v,  ,]  therefore 
by  observing  {v,   }  he  comes  to  know  5^. 


2,    In  the  limit  since  he  knows  n^,  his  limiting  posterior 
distribution  of  a^  must  be  degenerate  at  lim  v,  ,  =  o^, 


3,  Raiffa  and  Schlaifer  (1961)  show  that  the  mean  of  the 
gamma-2  posterior  distribution  of  (l/o^)  is  equal  to  the  inverse 
of  the  posterior  estimate  of  the  variance  as  defined  by 

(4.2.54)     E'^(l/a2)  =  l/v'^  . 

4,  Therefore  by  (2)  and  (3),  it  must  be  true  that 


(4.2,60)     lim  v'^  =   lim  v    =  o2  w,p.l  , 


Observe  that  the  argument  presented  before  applies  to  both  stationary 
and  nonstationary  cases.  Savage  (1971)  summarizes  informally  the 
argument  we  have  presented. 
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4 . 3  Prediction  Interval s  for  Normal,  Student,  Lognormal 
and  LogS tudent  Distributions 

Bayesian  analysis  is  generally  concerned  with  the  past  only 
insofar  as  it  relates  to  the  present  and  future;  interest  is  with  the 
current  situation  and  how  it  relates  to  what  might  happen  rather  than 
with  what  did  happen.  Above  all,  it  is  concerned  with  creating  a  mean- 
ingful view  of  the  future  in  the  minds  of  people  v;ho  make  decisions.  The 
Bayesian  metliods,  however,  include  people  explicitly  -  the  person  respon- 
sible for  the  analysis  and  all  the  people  concerned  with  using  the  out- 
put information  and  supplying  information  relevant  to  the  resulting 
actions.  Apart  from  the  fact  that  classical  analyses  often  ignore  exter- 
nal information  and  apart  from  the  fact  that  the  statistical  criterion 
is  usually  far  from  reflecting  the  decision  loss  function,  the  analysis 
often  neglects  the  people  who  will  communicate  with  each  other  and  the 
model.  People  have  sources  of  information  quite  beyond  the  data;  for 
example,  they  may  know  perfectly  well  that  a  competing  product  is  being 
introduced,  that  a  new  tliechnology  has  been  developed,  or  that  the  Presi- 
dent is  planning  to  sign  a  new  legislation  that  will  affect  the  marketing 
of  tlieir  product.  The  effects  of  such  events  can  often  be  well  foreseen 
in  a  qualitative  or  subjective  sense,  but  it  may  nevertheless  be  dif- 
ficult to  be  expressed  and  require  probability  distributions  to  describe 
the  uncertainty  surrounding  them.  It  is  necessary  that  people  can  commu- 
nicate tlieir  Information  to  the  metliod  and  that  the  method  clearly  com- 
municates tlie  uncertain  information  in  such  a  way  that  it  is  readily 
interpreted  and  used  by  decision  makers.  The  nonstationary  model  that  we 
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developed  in  Chapter  Three  for  normal  and  lognormal  processes  incorpo- 
rates prior  distrilmtions  on  the  unknown  parameters  to  reflect  the  deci- 
sion maker's  information. 

In  a  Bayesian  analysis, the  information  coming  from  the  data 
is  contained  in  the  posterior  distribution  of  the  unknown  parameter. 
One  way  to  partially  summarize  the  information  contained  in  the  pos- 
terior distribution  is  to  quote  one  or  more  intervals  which   contain 
stated  amounts  of  probability.  For  the  classical  statistician,  the 
information  coming  from  the  data  is  contained  in  the  sampling  distri- 
bution. He  can  summarize  the  information  in  the  sampling  distribution 
quoting  intervals  with  confidence  coefficient  Y.  Suppose  that  x  ,  . . . ,x 
form  a  random  sample  from  a  distribution  which  involves  a  parameter  6 
whose  value  is  unknown.  Suppose  also  that  two  statistics  T  (x, ,  . . . ,  x  ) 
and  T,-,(x,,  ...  ,  x  )  can  be  found  such  that,  no  matter  what  the  value 
of  6  may  be 

(4.3.1)    Pr[T^(x^,  ...,  x^)  <  e  <  T2(xj^,  ...  ,  x^)|e]  =  y, 

where  y  is  a  fixed  probability  (0<  Y  <1) •  If  the  observed  values  of 
T^  (x  ,  ...,  x^)  and  T2(x-,,  ...,  x^)  are  a  and  b,  then  it  is  said  that 
the  interval  (a,b)  is  a  confidence  interval  for  9  with  confidence  coef- 
ficient Y,  or, in  other  words,  that  the  interval  (a,b)  contains  0  with 
confidence  Y.  The  uncertainty  pertains  to  the  interval,  and  not  to  6. 
It  is  not  correct  to  state  that  0  lies  in  the  interval  (a,b) 
with  probability  )' .    Before  the  values  of  the  statistics  T  (x,  ,  ...  ,  x  ) 
and  T^Cx^,  ...  ,  x  )  are  observed,  those  statistics  are  random  variables, 
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It  follows  therefore  from  (A.  3.1)  that  y  will  lie  in  the  random  inter- 
val having  end  points  T, (x  ,  ...  ,  x  )  and  T  (x, ,  ...  ,  x  )  with  pro- 

1   1         n       2   1         n 

bability  y.  After  the  specific  values  T  (x  ,  ...  ,  x  )  =  a  and 

^1         n 

T„(x, ,  ...  ,  x  )  =  b  have  been  observed,  it  is  not  possible  to  assign 
a  probability  to  the  event  that  0  lies  in  the  specific  interval  (a,b) 
without  regarding  6  as  a  random  variable  which  itself  lias  a  probability 
distribution.  In  other  words,  it  is  necessary  first  to  assign  a  prior 
distribution  to  6  and  then  to  use  the  resulting  posterior  distribution 
to  calculate  the  probability  that  6  lies  in  the  interval  (a,b).  Rather 
than  assigning  a  prior  distribution  to  the  parameter  6,  classical  statis- 
ticians have  preffered  to  state  that  there  is  confidence  y.  rather  than 
probability  y,    that  0  lies  in  the  interval  (a,b). 

To  a  classicist,  any  given  confidence  interval  statement  is 
either  correct   (in  which  case  the  probability  that  it  is  correct  is 
1.0)  or  incorrect   (in  which  case  the  probability  tliat  it  is  incorrect 
is  0.0)  .  That  is,  a  confidence  interval  is  one  type  of  interval  estimate 
that  has  the  feature  that  in  repeated  sampling  a  known  proportion   (for 
instance,  95%)  of  the  intervals  computed  by  a  given  method  will  include 
the  population  parameter.  This  concept  has  a  shortcoming  since, although 
the  particular  sample  values  that  are  observed  may  give  the  experimenter 
additional  information  about  whether  or  not  the  interval  formed  from 
these  particular  values  actually  does  include  0,  there  is  no  way  to 
adjust  the  confidence  coefficient  y  in  the  light  of  this  new  information. 
To  differentiate  betv>;een  the  two  statements,  usually  the  classical  inter- 
val estimate  is  called  a  "confidence  interval"  and  the  Bayesian  interval 
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is  called  a  "credible  interval".  Users  of  classical  intervals  tend  to 
interpret  them  in  the  subjective  sense  as  probability  statements  about 
a  random  variable  9  despite  the  classical  statistician's  emphasis  on 
the  frequency  interpretation.  Pratt  (1965)  has  observed  that  people 
should  not  be  blamed  for  this  misinterpretation,  since  the  correct 
interpretation  ([a,b]  is  the  interval  which,  before  the  observations 
are  obtained,  had  probability  Y  of  covering  6)  is  simply  not  relevant 
to  people  concerned  solely  vvfith  9  and  not  with  the  observations  vjhose 
only  role  is  to  furnish  information  about  9.  The  classical  approach 
often  requires  that  Y,  the  probability  associated  witfi  the  interval 
estimate,  be  chosen  in  advance  of  sampling.  The  Bayesian  may  wish  to 
look  at  intervals  for  several  different  values  of  Y  (not  necessarily 
chosen  in  advance) . 

A  rather  interesting  situation  arises  when  an  interval  is 
needed  to  show  a  range  within  which  most  of  the  distribution   lies.  In 
searching  for  ways  to  summarize  the  information  in  the  posterior  distri- 
bution P(S|x),  wliere  9  is  the  unknown  parameter  and  x  is  the  vector  of 
observations,  it  is  to  be  noted  that,  although  the  interval  over  which 
the  posterior  density  is  nonzero  may  extend  over  infinite  ranges  in  the 
parameter  space,  nevertheless  over  a  substantial  part  of  the  parameter 
space  the  density  may  be  negligible.  Thus  it  may  be  possible  to  construct 
a  relatively  small  interval  which  contains  most  of  the  probability  or  to 
construct  a  number  of  intervals  wliicii  contain  various  stated  proportions 
of  the  total  prol)abil  i  ty .  There  are  an  infinite  number  of  ways  in  which 
these  intervals  can  be  constructed. 
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In   some  applications,  two  properties  are  desirable  for  such 
intervals: 

1)  the  probability  density  of  every  point  inside  the 
interval  is  at  least  as  large  as  that  of  any  point  outside  it,  and 

2)  for  a  given  probability  content  the  interval  should 
be  as  short  as  possible   . 

Intervals  v;hich  have  these  properties  have  been  called  highest  pos- 
terior density  (H.P.D.)  intervals.  The  normal,  lognormal,  Student  and 
logStudent  will  permit  H.P.D.  intervals.  Moreover  for  these  distribu- 
tions, as  for  any  unimodal  distribution,  the  H.P.D.  interval  of 
content  y  is  unique.  Throughout  the  discussion  in  the  previous  para- 
graphs we  assumed  that  there  was  only  one  unknown  parameter.  If  we 
are  referring  to  a  vector  of  the  unknown  parameters,  i.e.,  6=  (0i,6o), 
all  that  can  be  known  about  6  is  contained  in  the  joint  posterior 
bivariate  distribution.  Mathematically  speaking,  therefore,  the  problem 
of  making  inferences  about  y  is  solved  as  soon  as  the  posterior  distri- 
bution is  written.  As  soon  as  we  consider  more  than  one  unknown  par- 
ameter we  refer  to  highest  posterior  density  (H.P.D.)  regions  instead  of 
H.P.D.  intervals.  As  with  H.P.D.  intervals, the  region  should  be  such 
that  the  probability  density  of  every  point  inside  it  is  at  least  as 
large  as  that  of  any  point  outside  it  or  the  region  should  be  such 
that  for  a  given  probability  content,  it  occupies  the  smallest  volume 


Sometimes  in  order  to  have  the  smallest  total  width  one  must 
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in  the  parameter  space. 

We  em[)hasized  in  Cliai)ter  Three  that  one  of  the  purposes  of 
prediction  is  often  to  provide  some  estimate,  either  point  or  interval, 
for  future  observations  of  an  experiment  F  based  on  the  results  obtained 
from  an  informative  experin\ent  E.  In  other  words,  in  addition  to  being 
interested  in  the  posterior  distribution  of  the  unknown  parameters  we 
are  interested  in  the  distribution  of  further  samples  or  observations. 
For  instance,  it  is  sometimes  of  interest  to  obtain  a  value,  arrived 
at  by  life  testing,  that  with  high  probability  will  be  less  than  the 
life  length  of  a  particular  component  that  is  to  be  used  in  a  system. 
Or,  on  the  basis  of  annual  profits  in  previous  years,  a  firm  is  interested 
in  having  an  estimate,  in  interval  form,  of  the  profits  for  the  coming 
year. 

These  are  examples  of  statistical  inference  problems  called 

prediction  intervals  or  |i-expectation  tolerance  intervals.  The  problem 

can  be  stated  more  formally  as  follows.  [See  Aitchison  and  Schulthorpe 

(1965)  and  Fraser  and  Guttman  (1956).]  Suppose  an  informative  experiment 

has  been  performed.  A  random  sample  x,,  X2,  ...  ,  Xj^  is  taken  from  a 

distribution  that  belongs  to  the  class  of  density  functions  [p  (•|e):6t;0], 

E   ' 

say  f(x|6).  Also  assume  that  there  is  a  future  experiment  F,  which  con- 
sists of  taking  a  random  sample  Y,  for  which  a  prediction  of  some  sort 
is  required  and  that  the  possible  probabilistic  descriptions  of  F  form 
the  class  of  density  functions  [p  (•  |  6 )  :(;)fc  0  ]  .  The  densities  describing 
E  and  F  are  conditioned  by  the  same  parameter  vector.  It  is  through 
this  connection  between  E  and  F  that  E  provides  information  about  F. 
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Although  E  and  F  are  connected  by  9 ,  It  iq  assumed  that  for  given  Q    they 

are  statistically  independent.  On  the  basis  of  the  sample  x, ,  ...  ,  x   we 

wish  to  make  a  prediction  about  Y,  usually  in  the  form  of  an  Interval  or 

region  tiiat  we  are  confident  will  contain  the  outcome  of  Y.  That  is,  if 

L  and  U  are  functions  of  x, ,  ...  ,  x  ,  then 

1'      '   n 

(4.3.2)  Pr(  1,  <  Y  <  U  )  =   g, 
or  equivalently 

(4.3.3)  E  {  /'  f(y|e)  dy}  =   3. 

L 

Aitchison  and  Sculthorpe  (1965)  classify  the  prediction  problem 
in  two  categories,  first  a  prediction  is  required  for  only  one  perfor- 
mance of  F  and  second  a  series  of  replications  of  V   is  to  be  conducted 
and  then  the  prediction  region,  R,  is  to  be  used  for  each  replication. 
Although  there  could  be  more  than  one  replication  in  a  single  time  period 
and  one  can  still  get  prediction  intervals  from  future  replications  from 
what  we  know  about  m'  and  e,  we  are  restricting  ourselves  to  single  repli- 
cations. In  other  words,  each  time  that  we  find  the  predictive  distribu- 
tion we  will  be  concerned  with  one,  future  experiment  F.  Faced  with  case 
one  a  Bayesian  would  proceed  to  obtain  f(y|x),  the  posterior  distribu- 
tion of  y  given  x.  As  we  pointed  out  in  Chapter  Three,  from  a  prior  den- 
sity f(6)  on  0  the  posterior  density  f(0|x)  is  obtained  in  the  usual  way 
£ind  this  is  convened  into  f(y|x)  through  the  relation 


(4.3.4)     f(y|x)  =  /  f^(y|e)  f(elx)  de 

r 
0 
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f(y|x.)  is  called  the  predictive  disti-ibution  of  y. 

Most  of  tlie  liter.iture  on  prediction  intervals  is  concerned 
with  solving  for  prediction  intervals  of  a  particular  type  or  solving 
for  intervals  for  a  particular  distribution.  For  instance,  Thatcher 
(1964)  found  prediction  limits  for  binomial  variables  which  do  not 
depend  on  any  assumptions  about  the  unknown  proportion  in  the  popula- 
tion. Hahn  (1969)  considers  prediction  regions  for  k  future  Y  ol^ser- 
vations  when  sampling  from  a  normal  distribution.  Shah  (1969)  and  Nelson 
(1970)  present  a  method  for  obtaining  prediction  intervals  for  a  Pois- 
son  variable  and  generate  prediction  limits  for  the  numl)cr  of  failures 
in  one  time  interval  by  observing  the  failures  in  the  other  time  inter- 
val, prt)vided  both  observations  are  subject  to  the  same  Poisson  law. 
Faulkenberry  (197  3)  obtains  a  prediction  interval  for  a  random  variable 
Y  based  on  the  conditional  distribution  of  y  given  a  sufficient  sta- 
tistic for  the  conditioning  parameter.  Aitchison  (1966)  considers  the 
construction  of  linear  utility  tolerance  intervals  which  do  take  into 
account  how  far  inside  or  outside  the  interval  a  future  observation  y 
happens  to  fall.  From  a  Rayesian  viewpoint,  it  is  found  that  expected- 
cover  and  linear  utility  intervals  can  be  regarded  as  equivalent  through 
a  simple  relation  between  the  expected  cover  and  the  relative  cost  ratio. 
For  the  frequentist  approach, it  is  first  shown  that  linear-utility  inter- 
vals can  be  simply  constructed  for  the  normal  and  gamma  distributions. 
Comparison  of  these  with  expec ted-cover  intervals  shows  that,  while  there 
is  no  complete  identity,  there  is  an  equivalence  in  a  "large  sample" 
sense. 


HI 
Prediction  intervals  for  future  observations  in  life  testing 
situations  have  been  derived  also  by  Hewitt  (1968),  Nelson  (1968),  and 
Lawless  (1971,  1972)  through  the  use  of  expected-cover  tolerance  regions. 
Dunsmore  (1974)  gives  a  Bayesian  approach  to  such  situations  and  uses 
the  concept  of  the  Bayesian  predictive  distribution.  He  considers  both 
the  exponential  and  the  two-parameter  exponential  distributions.  As 
was  pointed  out  in  Chapter  Two  we  intend  to  use  the  same  approach  for 
the  construction  of  prediction  intervals  for  the  normal,  lognormal. 
Student  and  logStudent  distributions  under  conditions  of  nonstationary 
shift  parameters. 

If  the  prior  distribution  is  natural  conjugate  to  the  process 
then  the  predictive  distribution  for  normal  processes  is  normal  when 
U  is  unknown  and  a^  is  known  and  is  Student  when  \i    and  o^  are  both 
unknown.  The  determination  of  prediction  intervals  in  general  and  H.P.D. 
intervals  in  particular  is  easy  due  to  the  characteristics  of  both  dis- 
tributions. The  normal  and  Student  distributions  are  similar  in  the 
sense  that  they  are  unimodal,  symmetric,  bell  shaped,  and  asymptotic, 
extending  from  minus  infinity  to  plus  infinity.  Graphically,  the  stan- 
dardized Student  distribution  is  flatter  than  the  normal  distribution, 
with  a  larger  portion  of  the  area  under  the  curve  located  in  the  tails 
of  the  distribution.  This  implies  that  one  must  proceed  a  greater  dis- 
tance along  the  number  line  away  from  the  mean  under  a  standardized 
Student  distribut  ion  to  include  any  given  percentage  of  the  area  under 
the  curve  than  would  be  the  case  for  the  standardized  normal  distribu- 
tion. Since  both  distributions  are  symmetric,  to  construct  H.P.D.  inter- 
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vals  of  Y  content  it  suffices  to  take  the  area  between  the  lower  limit 
of  the  interval  and  tht-  Piean  to  be  equal  to  the  area  between  tlie  mean 
and  the  upper  limit  of  the  interval.  If  we  let  a  be  the  lower  limit, 
b  be  the  upper  limit  and  c  be  the  mean  the  condition  could  be  written 
as, 

(4.3.5)     I""   f(x|y)  dx  =  (y/2)  =   /^  f(x|y)  dx  . 
a  a 

Since  the  distributions  are  symmetric,  the  length  of  the  Interval  between 
the  lov;er  limit  and  the  mean  is  equal  to  the  length  between  the  mean  and 
the  upper  limit.  To  obtain  the  probabilities  needed  to  determine  the 
limits  of  the  interval  we  use  a  table  of  the  probability  integral  of  the 
normal  and  Student  curve  depending  on  the  assumptions  of  tlie  problem.  Pre- 
diction intervals  of  content  y  take  the  form 

(A. 3.6)     RCx)  t  K,    Std.  Dev.  (x)  , 

1-Y 

wliere  K     refers  to  the  number  of  standard  deviations  one  must  proceed 
1-Y 

in  one  direction  from  thi'  mean  in  order  to  encompass  (y/2)  percent  of 

the  area  under  the  curve. 

For  the  case  wht-n  p  is  the  unknown  parameter  and  the  data 

generating  process  is  normal,  assume  that  after  t  periods  we  have  a 

posterior  distribution  f"(ii  )  which  is  normal  with  mean  m"  and  vari- 

t  ^t  t 

ance  n^/n".  The  predictive  distribution  at  the  end  of  period  t  was 
sliown  in  equ.ition  (AI.12)  to  be  normal  witli  mean,  m",  and  variance, 
o'fl  +  (1/ri")].  For  tiiis  case  the  prediction  intervals  of  content  y 
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take  the  form 

(A.  3.  7)    m"  .  tC     [o2(l  +  [i/n'']))^/^ 
t     l-y  I 

For  the  case  when  both  [i  and  o^  are  the  unknovvai  parameters  and  the  data 
generating  process  is  normal,  assume  that  after  t  periods  we  have  a  pos- 
terior distribution  f"(ij  ,0"^)  which  is  normal-gamnia-2  with  parameters 

m" ,  n",  v"  and  d" .  The  predictive  distribution  at  the  end  of  period  t 
t    t    t      t 

was  shown  in  equation  (AI.33)  to  be  Student  with  mean,  m" ,  and  variance, 
[v"(n''  +  ])/n''J  ld'7(d"-2)].  For  this  case  the  prediction  intervals  of 
content  y  take  the  form 

(4.3.8)    m'^  J  K^_   {[v-jlCn'^  +  l)/n'^']  d"  /  (d"-2)]^^^\ 

The  predictive  distribution  for  lognormal  processes  is  lognormal 

when  y  is  unknown  and  o-  is  known  and  is  logStudent  when  p  and  o   are 

both  unknown.  The  construction  of  prediction  intervals  in  general  and 

H.P.D.  intervals  in  particular  becomes  difficult  for  the  lognormal  and 

the  logStudent  predictive  distributions  since  these  distributions  are 

asymmetric.  In  Appendix  III  we  provide  an  algorithm  to  construct  the 

H.P.D.  intervals  when  the  predictive  distributions  are  asymmetric.  In 

any  given  period  t  the  user  only  has  to  provide  the  current  values  of 

the  parameters  of  the  predictive  distribution,  i.e.,  m"  and  n2(l+n")/n" 

t  t    t 

for  the  lognormal  case  and  m" ,  v",  n" ,  d''  for  the  logStudent  case.  It 
is  shown  in  Appendix  III  that  the  algorithm  finds  tlie  highest  posterior 
density  intervals  in  very  few  iterations.  It  took  about  15  iterations 
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to  find  the  intervals  in  the  examples  that  were  considered. 

In  Cliapter  Tliiee  we  pointed  out  that,  under  nonstationary 
conditions,  if  the  data  generating  process  is  normal  for  the  case  when 
p  is  the  unknown  parameter  and  for  the  case  when  p  and  g^  ^j-g  t^i^g 
unknown  parameters  the  predictive  distribution  changes  in  mean  and  vari- 
ance between  consecutive  time  periods.  The  E  , , (x   )  is  always  changing 
depending  on  the  stochastic  change  of  the  mean  p    .  I'hus  it  is  not  pos- 
sible to  establisli  how  the  H.P.D.  interval  for  this  predictive  distribu- 
tion compares  with  the  stationary  H.P.D.  interval  under  the  same  assump- 
tions. However,  in  the  case  of  nonstationarity  v\7itli  no  drift,  i.e.,  u=0 
for  a  given  posterior  distribution  of  p   at  time  t,  the  only  difference 
between  the  predictive  distribution  of  x  ,   under  stationarity  and  the 

predictive  distril)Lition  of  x    under  nonstationarity  is  the  variance 

t+1 

term.  As  expected,  the  variance  of  the  predictive  distribution  is  larger 
when  p  is  nonstationary.  For  normal  and  Student  processes  the  H.P.D. 
interval  will  be  wider  for  a  given  content  y  when  p  is  nonstationary 
than  when  p  is  stationary.  A  comparison  of  stationary  versus  nonstationary 
results  when  the  data  generating  process  is  iognormal  shows  that  as  in 
the  normal  case,  the  nonstationarity  condition  causes  the  prediction 
intervals  to  be  larger  under  the  nonstationary  conditions  than  under 
stationary  conditions  for  both  parameter  uncertainty  cases. 

A  rather  different  approach  to  the  prediction  problem,  termed  the 
Certainty  F.quivalent  (CK)  approach,  is  considered  by  Holt  £t .  a^l .  (1960) 
and  Tlieil  (196A)  among  others.  Suppose,  as  in  the  classical  school,  that 
the  parameter  ii  of  a  normal  distribution  is  fixed  rather  than  random. 
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but  that  the  decision  maker  does  not  know  this  fixed  value  and  estimates 
it  by  means  of  some  statistical  procedure;  or  consider  the  case  where  p 
is  random  and  its  expectation  is  E(y)  ,  but  tlie  decision  maker  does  not 
know  E(p)  and  estimates  it.  In  the  CE  approach  the  decision  maker  uses 
the  estimates  of  the  uncertain  parameters  in  place  of  the  relevant  true 
values,  i.e.,  they  are  treated  as  if  they  were  the  actual  values  of  the 
parameters.  According  to  the  method,  the  point  estimate  constitutes  a 
certainty  equivalent  for  complete  knowledge  of  the  distribution  function. 
That  is,  if  the  distribution  of  x  is  f(x|ft),  where  0  is  an  unknown  para- 
meter or  vector  of  parameters,  then  an  estimate  e  for  the  unknown  para- 
meter constitutes  a  certainty  equivalent  and  f(x|e)  is  considered  to 
represent  full  knt)wledge  of  the  distribution  f(x|o).  The  decision  maker 
then  bases  all  his  probability  statements  and  decision  choices  on  the  dis- 
tribution f (x I o)  . 

Theil  (196A),  Brown  (1976),  Barry  (1974)  and  Barry  et.  al.  (1977) 
show  that  the  CE  approach  can  lead  to  inappropiate  decisions  since  it  does 
not  reflect  uncertainty  in  0  as  is  done  in  the  use  of  predictive  distri- 
butions. However  this  approach  allows  the  decision  maker  to  make  the  pro- 
bability statements  of  most  direct  interest  to  him  without  using  confidence 
Interval  terms.  Thus  this  approach,  CE.  would  seem  preferable  in  some 
respects  to  a  classical  confidence  interval  approach.  Since  there  is  the 
problem  that  the  true  parameters  may  deviate  from  the  estimates,  a  problem 
that  is  variously  referred  to  as  estimation  risk  or  parameter  uncertainty, 
much  effort  has  been  devoted  to  the  task  of  improving  the  estimate  that  is 
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made  of  the  vector  of  parameters  9.  The  Bayesian,  on  the  otlier  hand,  as- 
seses  a  probability  distribution  over  the  range  of  possible  values  q 
can  assume.  In  general,  no  one  point  0  can  fully  capture  the  informa- 
tion contained  in  this  distribution  except  in  the  special  case  where  it 
is  concentrated  about  0,  so  the  Bayesian  methodology  provides  an  approach 
which  uses  as  much  information  as  possible.  In  the  case  where  the  prior 
beliefs  of  the  decision  maker,  with  regard  to  the  unknown  parameters, 
have  convenient  representations,  i.e.,  mathematically  tractable  forms, 
the  Bayesian  approach  has  been  shown  to  perform  better  than  the  CE 
approaiJi.  To  the  extent  that  a  CE  distribution  misrepresents  the 
decision  makers  predictive,  the  CE  approach  can  lead  to  inappropiate 
decisions  [see  Brown  (1976)].  In  conclusion,  since  the  CE  approach  does 
not  include  the  parameter  uncertainty  it  understates  the  uncertainty 
faced  by  the  decision  maker  and  could  produce  predictive  distributions 
that  are  misleading. 

Since  the  CE  approach  does  not  consider  parameter  uncertainty, 
it  yields  prediction  intervals  that  overstate  the  content  probability 
or  (equivalently)  understate  their  risk.  Thus  the  CE  approach  discards 
information,  i.e.,  the  distribution  of  6,  and  then  gives  interval  esti- 
mates that  appear  more  informative  than  the  Bayesian  highest  posterior 
density  intervals.  In  Chapter  Five  we  are  going  to  show  some  examples 
of  this  condition  when  vje  present  applications  of  the  results  from 
Chapters  Three  and  Four  to  Cost-Volume-Prof it  Analysis  and  life  testing 
models. 
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4  .  A  Conclusion 
In  this  c:hapter  we  discuss  the  limiting  behavior  of  tlie 
parameters  m',  v',  n'  and  d'  of  the  prior  and  predictive  distribu- 
tions for  the  normal  and  lognormal  data  generating  processes.  In 
addition  we  discuss  the  implications  of  these  limiting  results  for 
the  inferences  and  decisions  based  on  the  posterior  and  predictive 
distributions.  The  asymptotic  behavior  of  the  model  has  important 
implications  for  the  decision  maker.  An  implication  of  the  stationary 
Bayesian  model  for  normal  and  lognormal  processes  is  that  as  addition- 
al observations  are  collected  parameter  uncertainty  is  reduced  and  (in 
the  limit)  eliminated  altogether,  In  contrast,  for  the  nonstationary 
model  considered  in  this  dissertation  the  following  inferential  results 
are  obtained: 

1.  for  the  case  of  lognormal  or 'normal  model,  a  particular 
form  of  stochastic  parameter  variation  implies  a  treatment  of  data 
involving  the  use  of  all  observations  in  a  differential  weighting 
scheme;  and 

2.  random  parameter  variation  produces  important  differences 
in  the  limiting  Isehavior  of  the  prior  and  predictive  distributions 
since  under  nonstationarity  the  limiting  values  of  the  parameters  of 
the  posterior  and  predictive  distributions  cannot  be  determined  clearly. 

The  protilem  of  constructing  prediction  intervals  for  normal. 
Student,  lognormal  and  logStudent  distributions  is  considered  in  this' 
chapter  ,  It  is  pointed  out    that  it  is  easy  to  construct  these  intervals 
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for  the  normal  and  Student  distributions  but  that  it  is  rather  dif- 
ficult for  the  lognornial  and  logStudent  distributions.  An  algorithm 
is  presented  that  efficiently  compute  Bayesian  prediction  intervals 
for  lognormal  and  logStudent  distributions. 


CHAPTER  FIVE 

NONSTATIONAKTTY  TN  CVP  AND  STATTSTICAT,  LIFE  ANALYSIS 

5 . 1  Introductl on 

In  Chapter  Four  v;e  pointed  out  that  one  objective  of  this  dis- 
sertation is  to  develop  Bayesian  prediction  intervals  for  future  obser- 
vations that  come  from  normal  and  lognorr.ial  data  generating  processes 
under  conditions  of  nonstationary  means.  In  particular  we  stressed  the 
importance  of  tiighe'St  posterior  density  intervals  as  a  mean  to  convey 
to  the  decision  maker  wli.it  he  is  entitled  to  believe  about  the  predictive 
distribution  of  the  variable  of  interest.  This  kind  of  analysis  is 
partiiularly  useful  in  tlie  area  of  Cost-Volume-Profit  (CVP)  Analysis 
(see  Dickinson  (1474),  Hilliard  and  Leitch  (1975)  and  Kaplan  (1977) 
among  others)  and  in  tlie  area  of  Statistical  Life  Analysis  (see  Folk  and 
Browne  (1975),  Jones  (1971)  and  Dunsmore  (1974)  among  others)  since  the 
application  of  the  lognormal  distribution  is  not  only  based  on  empirical 
observations,  but  in  some  cases  is  supported  by  theoretical  arguments. 
The  lognormal  distribution  has  been  found  to  be  a  serious  competitor  to 
the  Weibull  distribution  in  representing  lifetime  distributions  for  manu- 
factured products. 

In  Section  5.2  we  discuss  the  afiplication  of  the  results  of 
Chapters  Three  and  Four  concerning  nonsta tionar i ty  to  the  area  of  CVP 
analysis,  i'hf  proMem  of  CVP  analysis  will  be  considered  from  a  Bayesian 
viewpoint,  and  inferences  under  tlie  special  case  of  nonstat ionarity 
developed  in  (lliapler  Tlux'-c  v;  i  1  1  he  discussed.  Also  the  Bayesian  results 

119 


120 
under  nonstationar i ty  will  be  compai-ed  with  some  alternative  approaches 
suggested  in  the  accounting  literature.  In  Section  5.3  we  incorporate 
our  results  into  Che  theory  of  Statistical  Life  Analysis.  Practical 
implications  of  our  results  for  the  reliability  problems  are  discussed 
vjith  emphasis  on  the  predictive  distribution  of  the  random  variable.  In 
Section  5.4  we  present  the  conclusions  of  the  chapter. 

5 . 2  Nonstationarity  in  Cost-Volume-Prof it  Analysis 

5.2.1  Existing  Analysis 

The  scope  of  CVP  analysis  ranges  from  determination  of  the  opti- 
mal output  level  for  a  single-product  department  to  the  determination  of 
optimal  output  mix  for  a  large  multi-product  firm.  All  these  decisions 
rely  on  simple  relationsliips  between  changes  in  revenues  and  costs  and 
changes  in  output  levels  or  mixes.  All  CVP  analyses  are  characterized  by 
their  emphasis  on  cost  and  revenue  behavior  over  various  ranges  of 
output  levels  and  mixes.  The  applicability  of  probabilistic  models  for 
this  analysis  has  been  claimed  because  of  the  realism  of  such  models. 
That  is,  an  inherent  aspect  of  any  management  decision-making  situation 
is  the  presence  of  uncertainty  concerning  one  or  more  of  the  relevant 
factors;  for  exanijile,  tlie  entire  notion  of  forecasting  the  value  of 
some  variable  in  the  future  is  based  on  the  fact  that  there  is  uncertainty 
concerning  that  variable.  The  ideal  model  is  one  that  gives  a  probability 
distribution  of  tiie  criterion  variables.  Like  profit,  that  fully  recog- 
nizes tlie  uncertainty  faced  by  the  firm  and  incorporates  all  available 
information.  Ihe  I'ealism  of  such  a  model  is  dependent  on  assumptions  about 
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the  input  variables  ami  rigoruiis  mettiodoiogy  in  obtaining  the  output 

distribution. 

In  Chapter  I'wo  we  surveyed  some  of  the  relevant  literature 
related  to  the  development  of  CVP  analysis  under  uncertainty.  As  was 
pointed  out  in  that  survey,  most  of  the  papers  reflect  how  the  people 
that  liave  studied  CVF  analysis  have  neglected  one  potentially  important 
source  of  uncertainty  to  the  manager,  namely  the  problem  of  parameter 
uncertainty.  Classical  methods  used  in  CVP  analysis  generate  correct 
confidence  interval  estimates  only  on  those  occasions  where  the  manaj^^er 
has  no  knowledge  witli  respect  to  the  variable  he  is  attempting  to  estimate. 
Such  a  situation  seldom,  if  ever,  occurs.  Bayesian  methods  explicitly 
treat  judgmental  information  and  take  the  position  that  any  estimate 
generated  should  reflect  all  the  information  at  the  manager's  disposal. 
This  is  reflected  by  the  assignment  of  a  prior  distribution,  which  is 
used  in  conjunction  with  observed  sample  evidence  to  form  a  posterior 
distribution. 

Dickinson  (197A)  addressed  the  problem  of  CVP  analysis  under 
uncertainty  by  examining  the  reliability  of  using  sample  means  and  the 
unbiased  sample  variance  to  estimate  the  means  and  variances  of  the  past 
distributions  of  sales  demand.  As  pointed  out  in  Chapter  Two  his  paper 
illustrates  the  limitation  of  non  Bayesian  CVP  analysis  of  not  being  able 
to  obtain  the  probability   statements  of  most  interest  to  ttie  manager. 
The  Bayesian  appr(j;ich  provides  a  general  procedure  of  describing  and 
analy^.ing  any  suc-h  situation  without  tlie  appeal  to  ad  hoc  procedures  or 
ingenious  tric:ks  [see  Lindley  (1972)],  especially  through  the  use  of  the 


122 
predictive  distribution.  Barry,  Velez  and  Welch  (1977)  have  recently 
applied  a  predictive  Bayesian  model  to  CVP  analysis,  explicitly  allowing 
for  parameter  uncertainty.  An  implication  of  such  a  model  is  that  as 
additional  observations  are  collected  parameter  uncertainty  is  reduced 
and  (in  the  limit)  eliminated  altogether.  Such  an  implication  is  incon- 
sistent with  observed  real  world  behavior  largely  because  the  conditions 
under  which  firms  operate  typically  change  across  time. 

A  CVP  model  ideally  should  include  the  changing  character  of 
the  process  by  allowing  for  changes  in  the  parametric  description  of 
the  process  through  time.  Failure  to  recognize  the  nonstationary  condition 
may  result  in  misleading  inferences.  CVP  literature  has  neglected  to 
include  this  additional  source  of  uncertainty  that  influences  the  decision 
maker's  frame  of  reference  for  his  decision  process.  In  Chapter  Three  we 
showed  that  if  tlie  presence  of  nonstationarity  is  not  fully  recognized  then 
we  can  be  lead  to  a  serious  misinterpretation  of  the  conclusions  drawn 
from  a  stationary  model. 

5.2.2  Nonstationary  Bayesian  CVP  Nodel 

Assume  that  a  single  product  firm'  has  a  profit  function  defined 
by 

(5.2.1)    Z  =  Q[P-V]  -  F  , 

where  Z  =  total  profits  , 

Q  =  sales  volume  in  units, 
P  =  unit  selling  price  , 
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V   =   unit    V£\riable    cost 
cind 

F  =  total  fixed  cost. 

Thus  the  firm  produces  the  quantity  Q  at  a  fixed  cost  F,  and  variable 

cost  VQ .  Assume  that  the  only  random  element  in  the  system  is  the  quantity 

variable  Q.  In  addition  assume  that  Q  is  normally  distributed  with  mean 

u  and  variance  o^ ,  i.e.,  f  (Q|u,a'^).  Later  we  will  consider  the  cases 

N   ' 

where  other  variables  are  random  and  also  we  will  modify  the  analysis 
to  allow  for  lognormall ty  in  the  distribution  of  the  variables. 

In  general  the  values  of  the  parameters  of  the  CVP  model  will 
be  unknown.  Consider  a  manager  with  a  prior  distribution  over  the  parameters 
of  the  probability  model  of  Q,  say  f'(e|r)  where  6  includes  all  the 
unknown  parameters  and  r  represents  all  information  known  to  the  manager. 
In  particular  assume  that  if  p  is  the  only  unknown  parameter  then  the 
prior  distribution  is  the  normal  natural  conjugate  with  parameters  m' 
and  Q-^/n'  or  that  if  y  and  a^    are  both  unknown  then  the  manager  has 
a  normal-gamma  natural  conjugate  prior  witli  parameters  m' ,  n',  v';  and  d'. 
(See  de  Finnetti  (1962,1965),  Murphy  and  Winkler  (1970),  Savage  (1971), 
Stael  von  Holstein  (1970a,  1970b)  and  Winkler  (1967a,  1967b,  1969,  1971) 
for  a  discussion  of  evaluation  of  probability  assessors  and  assessments.) 
A  formal  Bayesian  analysis  articulates  the  evidence  of  a  sample,  say 
(I  ,  Q.;,  ,  ...  ,  Q  ,  vjitli  evidence  other  than  that  of  the  sample,  in  the 
form  ot  a  prior  distribution  of  the  parameters  to  obtain  a  posterior  dis- 
tribution of  tlie  unknown  parameters.  In  areas  like  CVP  analysis  it  is 
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doubtfLil  that  the  assumption  of  stationary  parameters  will  hold  over 
long  periods  of  time  since  variables  like  quantity  sold  (Q) ,  costs  and 
contribution  margin  (P-V)  are  affected  by  economic,  political  and  envi- 
ronmental factors.  Thus  we  are  going  to  assume  that  the  distribution 
of  the  random  variable,  sales, undergoes  a  gradual  mean  shift  between 

successive  periods  of  time  of  the  form  u   ,  =  u  +  e    as  defined  in 
^  *^t+l    ^t     t+1 

(3.3.8).  Tlie  Bayesian  analysis  provides  a  natural  method  to  include  the 
remaining  parameter  uncertainty  in  the  computation  of  the  predictive 
distribution. 

The  nonstationarity  assumption  affects  the  predictive  distri- 
bution of  the  coming  period's  sales  quantity  Q.  If  the  process  is  station- 
ary then  the  predictive  distribution  of  the  random  variable  Q  at  the  begin- 
ning of  period  t+1  is  tlie  same  as  the  distribution  that  we  had  at  tlie 
end  of  period  t.  However  if  we  assume  the  nonstationary  condition  and 
that  tlie  decision  maker  is  aware  of  the  nonstationarity,  then  the  prior 
distribution  of  the  parameter  at  the  start  of  period  t+1  has  a  different 
mean  and  variance.  Consequently,  the  predictive  distribution  changes  in 

mean  and  variance  between  consecutives  time  periods.  In  other  words  E    (x 

t+i'  t+ 

is  always  changing  depending  on  the  stochastic  change  of  the  shift  parame- 
ter y(-^j  .  In  the  case  of  nonstationarity  with  no  drift,  i.e.,  u=0,  if  the 
distrilnition  of  sales  is  normal  then,  assuming  that  they  started  from 
the  same  posteriors,  the  only  difference  between  the  predictive  distri- 
bution of  X   -,  under  statlonarity  and  the  predictive  distribution  of  Xf.,, 
under  nonstationarity  is  the  variance  term.  The  parameter  n^_i  i  is  smaller 
when  p  is  uukno\vm  and  nonstationary  than  when  y  is  unknown  but  stationary. 
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Hence,  as  expected,  the  variance  of  the  predictive  distribution  is 

larger. 

The  predictive  distribution  under  nonstationari ty  may  be  used 
to  make  probability  statements  about  sales  quantity  or,  if  desired, 
profits.  To  illustrate,  suppose  that  the  montlily  sales  Q  ,  Q  ,  ...  of 
a  firm  are  independent  and  identical  J y  tiistributed  random  variables 
witli  common  density  function  f  (Q|y,a2)  and  tliat  the  population  vari- 
ance o^  is  known  to  be  JOL).  Suppose  also  that,  at  the  beginning  of  a 
given  period  t,  the  manager  has  assessed  the  prior  distribution  function 
over  the  parameter  p   to  be 

(5.2.2)  f;(fiJm',o^7np  =  f^(p^  500,  25)  . 

Since  o^=  100  and  o-/n'=  25,  n'=  4.  If  the  manager  has  available  a  sample 
of,  say  12,  monthly  sales  with  sample  mean  m  =480  then  he  may  compute 
a  posterior  distribution  of  the  unknoivm  parameter  p   v;hich  will  reflect 
this  new  information  that  he  has  available.  Since  the  normal  prior  is 
natural  conjugate  for  sampling  from  an  independent  normal  process  the 
posterior  distribution  of  the  unknov^n  parameter  y   will  be 

(5.2.3)  q(u^\vr,o^-/n'p    =    f;^(pj485,  6.25)  . 

The  predictive  distribution  of  Q  given  the  av.iilable  information  (and 
uncertainty)  about  p  can  be  obtained  using  the  posterior  distribution 
of  p  .  T!ie  predictive  d  i  str  iliut  ion  of  sales  at  period  t  is 
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(5. 2. A)    fNC^Im'^;,  u2[l+  L/n;'J)  =  fNCQJ^SS,  106.25). 
If  now  we  assume  Lliat  the  random  shock  distribution  is 

(5.2.5)  f^Cet-lu,  a2/ng)  =  f^CeJO,  50) 

then  the  prior  distribution  of  the  unknown  parameter  jj|-  at  the  begin- 
ning of  period  t+1  may  be  obtained  using  equations  (3.3.10)  and  (3.3.]1) 
This  new  prior  distribution  is 

(5.2.6)  fi;(p,+  j|m'',  a2(n;+  n^)/n^.^)  =  f'(M,^J^85,  56.25). 

The  predictive  distribution  under  nonstationarity  at  the  begin- 
ning of  period  t+1  is 

(5.2.7)  fj,(Q,.+Jm';,  a2[l+  l/n'^^l)  =  ^^C^+il^^^^    156.25). 

It  has  a  higher  variance  than  under  stationarity  as  was  pointed  out  in 
previous  paragraplis.  The  manager  may  determine,  in  any  given  period  t, 
the  predictive  distribution  of  profits  from  equation  (5.2.1).  Since  the 
predictive  distribution  of  sales  is  as  defined  in  (5.2.7)  then  Lin.'  pre- 
dictive distribution  of  profits  is 

(5.2.8)  fj^(TT^^j|m';  (P-V)-F,  o-[l+  l/n^+l](P-V)2). 

That  is,  if  we  suppose  that  the  contribution  margin  (P-V)  is  say  8,  and 
that  the  fixed  costs  (F)  are,  say,  1,000,  then  the  predictive  distribu- 
tion of  next  period's  profits  is 
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(5.2.9)    f^(n^^j2,880,  10,000). 

Probability  stateiiK'nts  are  easily  obtained  using  the  standard  normal 
distribution  tiieory.  Observe  how  this  analysis  provides  the  proba- 
bility statements  that  the  manager  needs  \v7ithout  the  necessity  of 
cumbersome  phrasing  in  terms  of  classical  confidence  intervals. 

Lets  look  now  at  the  same  problem  but  assuming  this  time 
that  the  decision  maker  knows  neither  the  population  variance  (o2) 
nor  the  population  mean  (p) .  This  is  the  most  involved  of  the  univari- 
ate normal  cases  since  it  requires  the  assignment  of  a  bivariate 
prior  distribution  function  to  (y,o^).  To  illustrate,  suppose  that 
the  manager  has  ex|)ressed  his  judgments  about  (P  ,o-)  by  a  normal- 
gamma  distribution  of  the  form  f'_  (jj^.  ,o- |m'  v'  n  '  ,d  '  )  =  f  '   (jj  ,5-^  |  500  ,  25  ,  10  ,  7) 
Assume  that  the  manager  takes  a  random  sampie  of  12  monthly  sales,  which 
are  assumed  to  come  from  a  process  with  unknown  mean  and  variance,  and 
that  the  sample  yields  a  sample  mean  (m  )  of  480  and  a  sample  variance 
(v^)  of  80.  He  may  compute  a  posterior  distribution  of  the  unknown 
parameters  pj.  and  o   using  equations  (3.3.17)  and  (3.3.18).  Since  the 
normal-gamma  prior  is  natural  conjugate  for- sampling  from  an  independent 
normal  process,  V'jith  unknown  parameters,  the  posterior  distribution  of 
p   and  a      will  be 

(5.2.10) 


fN_/^'°'l"i't'  ^'i'-  "t'  'Ip  =  ^^t!,_.  ^^'^^1  ^^^'  4784. A,  22,  19) 
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Under  stacionarity  the  prior  distribution  at  the  start  of  period  t+1 
is  e(iiuil  to  the  posterior  distribution  at  the  end  of  period  t.  The 
predictive  distribution  of  Q   given  the  available  information  (and 
uncertainty)  about  p   and  a^    can  be  obtained  using  the  posterior 
distribution  of  p   and  d^ .    From  equations  (3. 3. 49)  and  (3.3.50)  we 
know  that  the  predictive  distribution  of  sales  at  the  beginning  of 
period  t  is 
(5.2.11)   . 

f2^(QJm'^,  v'^in'^   +    l)d;_'/n'j;[d'^'-2])  =  fg^,(Qj485,  5590.32). 

If  we  now  assume  tliat  the  random  shock  distribution  is 

(5.2.12)      fN(et|u,  o2/n^^)  =  fN(it|0,  0^/2), 

then  the  manager  may  obtain  the  new  prior  distribution  of  p   and  5^ 
at  the  beginning  of  period  t+1  using  equations  (3.3.25  -  3.3.28). 
This  new  distribution  is 
(5.2.13) 

^'N-/f't+l'°'l4''  ^t'  "t"s/("s+"P'  d")  =  f^:^(Mt+i.5-|485,4784.4,l-83,19) 

The  predictive  distribution  of  sales  for  the  coming  period  under  nonsta- 
tionary  conditions  is  Student    with  mean  485  and  variance  8,269.28.  As 
expected,  the  additional  uncertainty  introduced  in  the  model  by  the 
shifting  means  has  caused  an  increase  in  the  variance  of  the  predic- 
tive distribution.  If  tiie  manager  does  not  recognize  in  liis  predic- 
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tive  model  the  existence  of  a  nonstationary  ronditioa  lie  may  draw 
inferenres  from  tlie  model  that  are  misleading. 

In  any  given  period  t,  the  manager  may  determine  the  predic- 
tive distribution  of  profits  from  equation  (5.2.1).  The  predictive 
distribution  of  profits  is 

(5.2.14) 

fsT(^+]l™t+l(P-V)-F,(Jt+l/^t+l-  2)fv'  +  i(n'^^+  1) /n^^^  )  [P-V]  ^)  . 

That  is,  if  we  assume  as  before  that  the  contribution  margin  (P-V) 
is  8,  and  that  the  fixed  costs  (F)  are  1,000  then  the  predictive 
distribution  of  next  period's  profits  is  f„  (fl    12880,  357,786.25) 
under  stationarity  and 


(5.2.15)     fsT*^^t+ll  "^^°'  529,233.92) 


under  nonstationari ty .  Probability  statements  are  easily  obtained 
using  Che  standard  Student  distribution  tables  available  in  many 
books. 

The  use  of  normal  distributions  in  •aijplications  where  the 
coefficient  of  variation  is  large  can  present  many  difficulties. 
The  lognormal  distribution  is  in  at  least  one  important  respect  a 
more  realistic  representation  of  distributions  of  variables  that 
cannot  assume  negative  values  (such  as  sales)  than  is  tiie  normal 
distribution.  A  normal  distribution  assigns  probability  to  such  events, 
vjlii  Le  the  lognoriikil  distribution  does  not.  Fur  tliermore ,  even  though 
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the  loRnormal  distribution  is  skewed,  by  taking  the  spread  jjarameter 
small  enough,  it  is  possible  to  construct  a  lognoi^mal  d  istr  ibut  icna 
closely  resembling  any  normal  distribution  (  except  those  with  high 
probabilities  of  negative  values  ) . 

Milliard  and  Leitch  (1975)  pointed  out  the  problem  of 
assuming  price  and  quantity  to  be  independent.  However,  if  we  assume 
that  sales  quantity  and  contribution  margin  are  joint  lognormally 
distributed  then  we  can  allow  for  statistical  dependence  among  the 
two  variables  as  we  will  show  later.  When  it  is  assumed  that  sales 
quantity  and  contribution  margin  are  both  lognormally  distribution, 
there  is  a  closed  form  expression  for  the  probability  distribution 
of  gross  profits  since  the  product  of  tvi/o  lognormal  random  variables 
is  also  lognormally  distributed. 

The  nonstationary  Bayesian  CVP  analysis  is  easily  extended 
to  the  case  of  a  lognormal  distribution  of  Q  or  to  a  case  where  sales 
quantity  and  contribution  margin  are  both  lognormally  distributed. 
The  extension  is  easy  because  if  x  is  lognormal  then  ln~x  is  normal. 
Suppose  that  the  distribution  of  sales  is  lognormal,  i.e., 

(5.2.16)  fLig(Q|p,a-)  =  [Qav''2^1  exp[-(ln  Q  -  P)/2o2], 
with  unknown  parameter  p  and  known  o^ .  Note  that  if  we  consider 

In  0  to  be  the  random  variable  instead  of  Q  the  lognormal  distribu- 
tion is  easily  transfromed  into  a  normal  distribution  and  vice  versa, 
i.e., 

(5.2.17)  iLN^'^lP'"'  )  =  ^"^  f^dn-Qlp.o^). 
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Thus  the  predictive  Cyi'  model  for  norm.il  processes  presented  before 
can  be  extended  t<i  lognormal  processes  by  considering  In  Q  to  bo  the 
random  variable.  For  instance,  suppose  that  monthly  sales  are  dis- 
tributed lognormal ly  with  unknown  p  and  known  a^    (=1).  Again  assume 
that,  at  the  beginning  of  a  given  period  t,  the  manager  has  assessed 
the  prior  distribution  over  the  unknown  parameter  p   to  be 
f'(r'(-|ni!>  a'^/n')  =  f'Cil  I  4,  .1).  For  a  sample  of  12  months  with  mean 
6.2  the  posterior  distribution  will  be  f"(p  | m" ,  a2/n")=f"(p  |  5.2,  .0455) 
The  predictive  distribution  of  In  Q  at  period  t  is 

(5.2.18)  f^Cln^Q Jm",  o2[l  +  1/n"])  =  f,,(ln  Q  I  5.2,  1.0455), 

N      t '  t  t       N      1 1 

or  the  predictive  distribution  of  Q   at  period  t  is 

(5.2.19)  f^^(()Jm",  a2[l+  l/n^'l)  =  ^i^^iQ^]    5.2,  1.0455). 

By  the  properties  of  lognormal  random  variates  it  follows  that  Q   has 
predictive  mean  E(Q  )  =  exp  [5.2  +  1.0455/2]  =  305/74  and  predictive 
variance  Var(Qj^)  =  [exp  (10.4)]  w(w-l)  where  w  =  exp  [1.0455],  that  is 
Var(Qj.)  =  172,002.72. 

To  obtain  probability  statements  regarding  Q   it  is  neces- 
sary to  translate  the  iirobability  statement  regarding  In  Q   using 
the  antilogarl thmic  transformation.  For  instance,  as  before  let  the 
contribution  margin  be  8  and  let  fixed  costs  be  1,000.  The  probabil- 
ity of  making  mori'  than  $3,000  in  profits  is  equal  to  the  probability 
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of  selling  more  than  500  units,  [Q^.  >  (n    -   F)/(P-V)]  or  In  Q  >^  6.2146. 
Since  the  distribution  of  In  Q   is  as  in  (5,2.18)  the  probability  of 
profits  in  excess  of  $3,000  can  be  obtained  from  the  standard  normal 
distribution  theory. 

The  normal  model  with  unknown  mean  and  variance  can  be 
extended  to  include  the  case  in  which  the  decision  maker  knows  neither 
the  population  variance  (a^)  nor  the  population  mean  (p^) -  The  pre- 
dictive distribution  of  Q   is  logStudent  when  both  parameters  of  the 
lognormal  distribution  are  unknown.  Assuming  the  prior  is  of  the  natural 
conjugate  form,  a  simple  operation  transforms  the  logStudent  distribu- 
tion into  a  Student  distribution;  i.e., 

(5.2.20)     fjc.(Q|f(,a^)  =  Q~^  f^dn^Ql  ,1 ,5^)  . 

Therefore  by  working  with  In  Q   instead  of  Q  ,  the  analysis  of  the 
normal  process  can  be  applied  to  obtain  a  Student  predictive  distribu- 
tion for  In  Q   unconditioned  by  the  unknown  parameters  p   and  g^ •  To 
obtain  probability  statements  for  Q   and  it   one  needs  to  obtain  probabil- 
ity statements  for  In  Q  .  For  instance,  suppose  that  the  monthly  sales 
are  distributed  lognormally  with  unknown  mean  and  variance  and  that 
the  predictive  distribution  of  In  Q  is  Student  with  mean  equal  to  485 
and  the  variance  equal  to  5590.32,  i.e.,    Q^.  r^   LS  (  485.  5,590.32). 
Under  the  assumptions  of  the  previous  example,  the  probability  of 
making  more  tlian  $3,000  is  equivalent  to  the  probability  of  selling 
more  than  500  units.  This  probability  can  be  obtained  from  the  standard 
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Stiidi^nt  disLribiiL  i  on  . 

An  impDitant  fcaLiire  of  the  model  Lli.iL  we  have  developc-d  is 
itri  unequal  weighting  o\    past  observations,  a  characteristic  that 
clearly  demoiistrates  the  problem  faced  by  users  that  apply  stationary 
inferences  when  the  variables  really  are  nonstationary .  It  ^^/as  pointed 
out  before  that  the  posterior  value  of  the  prior  parameter  m'  during  any 
given  period  t  is  m'^!  =  (m'n'  +  m  n)/(n'  +  n)  .  Under  stationari ty ,  suc- 
cessively applying  this  equation  gives  I'l'  ,  as  a  function  of  m'   the 
initial  prior  mean,  of  n,  the  sample  size,  and  of  the  past  sample  means. 
All  past  observations  are  weighted  equally  and  m'  ,  can  be  expressed  in 
the  form 


(5.2.21)     m'    =  (n|m-|  +   n      T.      m.)/(nj  +  tn) 

i=l   ^ 


or 


(5.2.22)      m;^^  =  (n|m'  +   E   Q.)/(n'  +  tn) , 

i  =  l 


n 


where  Q.  =   E   Q,  . 

'  k=l   '^^ 

Under  sta tionarity ,  n^^-j  =  n'^ln^/ (n'^'+  n ,)  ,  and  n'  ,  <  n".  If 


we 


assume  nonstationarl ty  with  no  drift,  i.e.,  u=0,  and  define  q  =n'/(n'+n), 
then  tlie  posterior  value  of  tlie  prior  mean  parameter  in  period  t+1  is 
(5.2.23)      m^^j  =  q^m^   +    (l-q^)m^. 

Successi  vi'l  V  applying  (5.2.23)  gives  m'.i  as  a  function  of  ml,  the 

initial  mean,  ;ind  m.  and  (|   for  i  =  1 .  2 ,  .  .  .  ,  t .  1 1  was  shown  in 

1       i 
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Appendix  I  ;md  in  Cliaptt-r  Four  that  the  weight  assigned  to  any  obser- 
vation, say  0    ,  in  determining  a  prior  distribution  for  P^ , ,  is 
a  strictly  decreasing  function  of  i.  That  is,  the  importance  of  any 
observed  value,  say  Q   .,  for  making  inferences  about  a  future  value 
of  the  mean,  say  p    ,  decreases  as  i  increases.  For  the  special 
case  in  which  n'  =  n   we  showed  that  the  prior  mean  at  the  beginning 
of  any  period   under  nonstationarity   can  be  expressed  as  the  sum  of 

the  initial  mean,  m'   discounted  by  a  factor  q,  and  an  exponentially 

t         t-1   i 

weighted  sum  of  the  observed  sample  means ; i . e ., m'   =  q  m'+  (1-q)  S   q  m   .. 

t+i        1  |=Q       t-i 

To  illustrate,  suppose  that  the  monthly  sales  Q  ,  Q  ,  ... 
of  a  firm  are  independent  and  Identically  distributed  random  vari- 
ables with  common  density  function  f  (Q | p, 0^=100) .  Assume  that  the 
random  shock  distribution  is  f  (e|0,  50).  Suppose  also  that,  at  the 
beginning  of  period  1,  the  manager  has  assessed  the  prior  distribution 
function  to  be  f'Cp  |500,  57.28),  To  obtain  a  simpler  expression  for 
comparisons  with  the  stationary  case,  v;e  are  assuming  that  at  the 
beginning  of  the  first  periotJ  the  model  is  already  in  steady  state 
form  in  the  sense  that  n!  =  n  =  1.74596  and  q  =q  =  ...  =  .127016.  If 
the  manager  has  available  a  sample  of  say,  12  monthly  sales  with 
sample  mean  m  =  480  then  the  mean  of  the  posterior  distr ibutitni  of 
p   is  482.5403. 

Since  we  are  assuming  that  there  is  nonstationarity  with  no 
drift  the  mean  of  t  lie  [jrior  distribution  of  p   under  stationarity 
and  nonstationarity  is  482.5403.  If  the  manager  has  available,  during 
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period  2,  a  new  sample  of  12  monthly  sales  with  a  sample  mean  m^=505 
then  he  may  compute  a  new  posterior  distribution  of  p^  whicli  reflects 
this  new  information  that  he  has  available.  Under  stationarity  the 
mean  of  the  posterior  distribution  of  p^  Vi/ill  be 

(5.2.24)  mi;  =  [1.74596(500)  +  12(480  +  505)  1 /[  1 .  74596  +  2(12)] 

=  493.0155  . 

Under  nonstationarity  with  n'  =  n   =  1.74596  the  mean  of  the  posterior 

1    L 

distribution  of  p.^  will  be 

(5.2.25)  m"  =  (.127016)2(500)  +  (1- . 12701 6) [ 505  +  (. 127016) (480) 

=  502.1473  . 

During  period  3  if  a  sample  of  12  observations  is  available 
that  yields  a  sample  mean  m  =  520  then  the  mean  of  the  posterior  dis- 
tribution of  Ot  will  be  m"  =  501.5895  under  stationarity.  Under  nonsta- 
tionarity and  steady  state  condition  the  mean  of  the  posterior  distri- 
bution of  li      will  be 

(5.2.26) 

m'^   =  (.127016)3(500)  +  (1-. 127016)  [520  +  (.  127016)  (500)  +  ( .  127016)  ^  (480)  ] 

m"  =  517.7324  . 

As  we  move  into  the  future  tlie  initial  prior  mean  has  loss  weight  in 
the  determination  of  the  prior  mean  m'.  From  the  exponentially  v;eighted 
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sum  of  sample  meaaa  we  note  that  recent  sample  means  are  weighted 
more  heavily  that  not  so  recent  ones.  The  impact  of  a  particular 
sample  mean  on  future  values  of  the  prior  distrihution  of  u    de- 
creases as  t  increases. 

5.2.3  Extensions  to  the  Nonstationary  Bayesian  CVP  Model 

It  is  possible  to  significantly  extend  the  model  presented 
in  the  previous  section  by  assuming  that  sales,  Q,  and  contribution 
margin,  (P-V) ,  are  normally  or  lognormally  distributed.  For  Instance 
suppose  that  Q  and  (P-V)  are  both  normally  distributed  with  unknown 
means  fj   and  M(-p_y^•  The  predictive  distribution  of  Q  and  the  predic- 
tive distribution  of  (P-V)  are  normally  distributed.  It  is  well  known 
[see  Ferrara,  Hayya  and  Nachman  (1972)]  that  the  distribution  of  the 
product  of  two  normally  distributed  random  variables  is  not  normally 
distributed.  However,  if  we  denote  Q*  =  e^  and  (P-V)*  =  e       to  be 
the  new  random  variables  then  the  distributions  of  Q"  and  (P-V)*  are 
lognormally  distributed.  If  a  conjugate  prior  is  assigned  to  y,  then 
the  predictive  distribution  of  a  lognormally  distributed  variable  when 
y  is  unknown,  is  also  lognormal;  hence  both.Q*  and  (P-V)*  have  log- 
normal  predictive  distributions.  However,  Patel,  Kapadia  and  Owen  (1976) 
point  out  that  if  x-j^  and  X2  are  independent  random  variables  with  proba- 
bility density  functions  f(x|  8^,62)  and  f  (x2  |a-j^  ,02)  ,  respectively,  then 
the  random  variable  Y  =XjX2  also  has  a  lognormal  distribution  with  pro- 
bability dL-nsiry  function  f(Y|0^+  a^,  0.-,+   a2)  .  Suppose  then  that  Q  and 
(P-V)  are  both  lojinornia  1  Ly  distributed  with  unknown  parameters   V'   and 
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p.    .  respectively.  Both  distributions  have  lognormal  predictive  distri- 
butions and  hence  Q[P-V]  is  lognormally  distributed.  To  illustrate,  sup- 
pose that  at  any  given  period  t  the  predictive  distribution  of  sales  (Q  ) 

is  given  by  f   [Q  Im''  ,  o'^Cl  +  1/n"  ]  and  that  tVie  predictive  dlstrl- 

^       ^   hN^t  '  Qt  Qt 

bution  of  the  contribution  margin  (P-V)  is  given  by 

^LNf^^"^^''"'(P-V)t'  °(P-V)^^  ^  ^^"(P-V)t^^-  '^''^"'  ^^'^   predictive  distri- 

bution  of  Q[P-V]  is  given  by 


(5.2.27) 

Once  we  find  the  predictive  distribution  as  defined  in  (5.2.21)  ue   can 
find  the  distribution  of  profits  as  was  explained  before. 

We  cannot  extend  our  analysis  to  the  cases  where  Q  and  (P-V) 
are  both  normally  or  lognormally  distributed  with  unknown  means  y 
and  p,    ,  and  unknown  variances  a?^  and  o^    ..  For  the  case  in  which 
both  Q  and  (P-V)  are  normally  distributed  it  was  shown  in  Chapter 
Four  tliat  the  predictive  distributions  are  Student.  The  distribution  of 
the  product  of  two  Student  distributiona  does  not  have  a  tractable  closed 
form  except  when  the  parameters  of  the  two  distributions  are  the  same. 
In  this  case  the  distribution  of  the  product  is  an  F  distribution.  If 
conjugate  priors  are  assigned  to  the  unknown  parameters  of  the  distribu- 
tion of  Q  and  (P-V)  then  for  the  case  in  which  Q  and  (P-V)  are  lognormally 
distributed,  ihe  predictive  distributions  are  logStudent.  We  cannot 
extend  our  analysis  to  this  case  either  because  the  distribution  of 
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the  product  of  two  logStudent  distributions  does  not  have  a  tractable 
closed  form  in  the  general  case, 

We  can  address  the  previous  problem  from  a  different  view 
point.  Suppose  that  Q  and  (P-V)  have  a  joint  lognormal  distribution 
with  parameters  p  and  Z  where 


(5,2,28) 


(P-V. 


and 


(5.2.29) 


9 


12 


21 


Given  that  \s   is  unknown  and  I  is  known,  the  decision  maker  can 
assess  a  joint  prior  distribution  on  the  vector  of  unknown  parameters. 
A  joint  predictive  distribution  for  Q  and  (P-V)  can  be  obtained  from 
the  posterior  distribution  of  y.  This  approach  works  if  Z  is  known; 
otherwise  there  is  not  a  tractable  closed  form  in  the  general  case. 

The  nonstationary  Bayesian  CVP  model  presented  in  the  previous 
section  can  be  extended  to  the  multiproduct  case.  In  any  given  period 
t,  the  random  variables  of  interest  are  vectors  Q   of  quantities  sold 

for  products  1,  2,  ...  ,P;  i.e.,  Q^=  ^Qjil'  ^t2'  "'■  '  ^tP^  *  ^"PP°^^  ^^'^^^ 
Q^  is  multivariate  normally  distributed  with  mean  vector  u   and  E 
covariance.  A  Bayesian  analysis  involves  the  assessment  of  a  prior  dis- 
tribution for  p   if  only  the  vector  of  means  is  unknown  or  a  joint 
prior  on  (p  ,Z)  if  both  parameters  are  unknown.  After  a  vector  Q   is 
observed,  the  posterior  distribution  of  the  unknown  parameters  is 
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available.  Next,  assume  that  values  of  the  mean  vector  for  successive 

time  periods  are  relateil  as  li  . ,  =  u   +  e  ,  where  e   is  a  multi- 

"t+l    ,t    -,  t        .t 

normal  "random  shock"  term  independent  of  p   with  known  mean  vector  u 
and  covariance  matrix  U, 

For  the  case  in  which  p   is  the  unknown  vector  of  parameters, 
Winkler  and  Barry  (197J)  discuss  the  methodology  to  obtain  the  pos- 
terior distribution  of  jTi   and  the  predictive  distribution  of  the 
vector  of  quantities  sold,  Q  ,  They  pointed  out  that  the  updating 
procedure  for  the  model  is  relatively  straightforward  but  that  dif- 
ficulties are  encountered  in  attempting  to  investigate  limiting 
properties  of  the  model.  Simplifying  assumptions  which  produce  limiting 
results  are: 

1.  the  prior  information  at  the  beginning  of  period  one 
can  be  thought  of  as  equivalent  to  the  information  obtained  from  a 
sample  of  size  n'  from  the  process  and  therefore  the  covariance  matrix 
of  the  initial  distribution,  say  S'   can  be  thought  as  a  constant 
multiple  of  E;  i.e.,  S'  =  (n')~  T.; 

2.  the  random  shocks  that  change  the  mean  vector  from 
period  to  period  are  such  that  they  do  not  change  the  underlying  rela- 
tionship among  the  elements  of  the  mean  vector  and  therefore  the  covari- 
ance matrix  il   can  be  thought  as  a  constant  multiple  of  I;  say  fi=  w   E. 

If  we  make  the  same  simplifying  assumptions  as  in  VJinkler 
and  Barry  (.197  J)  and  in  addition  assume  that  from  period  to  period  the 
unknown  covariance  matrix  E  does  not  change  then  we  can  extend  tlie  metli- 
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odology  from  the  univarinte  to  the  multivariate  case  for  the  case  in 
which  p   and  Z  are  both  unknown.  Under  these  assumptions,  during  the 
time  period  t  we  can  revise  the  joint  distribution  on  (fj  ,z)  and  at 
the  end  of  time  period  t  (the  beginning  of  time  period  t+1)  determine 
the  new  mean  vector  j]    ,  which  reflects  the  effects  of  the  random 
shock.  From  the  prior  distribution  of  (p  ,-,  ,Z)  the  decision  maker  can 
determine  the  predictive  distribution  of  quantities  sold  and  the 
predictive  distribution  of  profits. 

5 . 3  Nonstatlonarity  in  Statistical  Life  Analysis 

5.3,1  Existing  Ana lysis 

Reliability  theory  is  the  discipline  that  deal,  among  other 
things,  with  procedures  to  ensure  the  maximum  effectiveness  of  manu- 
factured articles.  In  general,  life  length  is  random,  and  so  we  are 
led  to  a  study  of  Life  distributions.  For  instance.  Farewell  and 
Prentice  (1977)  emphasize  the  applicability  of  lognormal  models  to 
recent  data  sets  from  the  industrial  and  medical  literature.  Reliability 
theory  emphasizes  the  prediction,  estimation  and  optimization  of  the 
probability  of  survival,  the  mean  life,  or  more  generally,  the  life 
distribution  of  components  or  systems, 

In  the  traditional  approach  to  life  testing  inference   points 
or  interval  estimators  for  functions  of  the  life  distributions  were 
obtained  by  substituting  for  the  unknown  parameters  the  point  estimators 
obtained  for  them.  Most  uses  of  Bayesian  methods  can  be  characterized 


141 

as  point  or  interval  estimation  of  parameters  of  life  distributions 
or  of  reliability  functions. 

All  of  the  papers  discussed  in  Chapter  Two  that  have  con- 
sidered life  testing  problems  have  assumed  a  stationary  situation. 
However,  no  matter  how  hard  the  company  works  to  maintain  constant 
condition  during  a  production  process,  fluctuations  in  the  production 
factors  can  lead  to  a  significant  variation  in  the  properties  of  the 
finished  products.  Variation  in  inputs,  in  some  cases,  tend  to  be 
purely  random  and  could  gradually  change  the  characteristics  of  the 
life  distributions  of  the  products.  Moreover,  the  wearout  of  the  machines 
used  in  the  manufacture  of  the  products  could  cause  changes  in  the  quality 
of  the  products  and  hence  in  the  parameters  of  the  life  distributions. 
Again  we  want  to  stress  that  we  are  referring  to  gradual  changes,  the 
effects  of  which  are   not  perfectly  predictable  in  advance  for  a  partic- 
ular period,  i.e.,  the  characteristics  of  the  process  vary  across  time 
but  are  relatively  constant  vjithin  a  given  period.  In  our  opinion  the 
model  developed  in  Chapters  Three  and  Four  provides  a  convenient  frame- 
work to  study  the  effects  of  nonstationarity  on  the  inferences  drawn 
from  life  testing  statistical  models. 

5.3.2  A  Life  Testi ng  Model  Under  Nonstationarity 

A  natural  framework  for  studying  the  problem  of  changing 
parameters  in  terms  of  forecasting  the  life  of  a  manufactured  product 
is  provided  by  the  Bayesian  approacli  to  statistical  Inference.  Having  - 
a  product,  let  us  consider  the  random  interval  beginning  with  the 
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moment  the  product  starts  to  work  and  ending  at  the  moment  of  its 
failure.  This  positive  random  variable  is  called  life  time  of  the 
product  or  time  to  failure. 

Suppose  that  the  model  for  the  life  of  a  product  is  the  log- 
normal  distribution  with  parameters  y  and  o'^ ;    i.e.,  the  life  of 
products  coming  from  a  given  process,  L^ ,  h^ ,  ...  ,  are  independent 
and  identically  tlistributed  random  variables  with  common  density 
function  f   (lIp.o'^).  Suppose  also  that  a  posterior  distribution 
over  the  unknown  parameters  is  available  and  that  the  distribution 
of  the  random  variable,  life,  undergoes  a  gradual  parameter  shift 
between  successive  periods  of  time  of  the  form  \i  =  p^  +  e,-,-i  as 

defined  in  (3.3.8).  From  a  formal  Bayesian  analysis,  during  a  given 
period  t,  two  distributions  are  available,  namely  the  posterior  dis- 
tribution and  the  predictive  distribution  of  a  future  observation 
which  comes  from  the  same  data  generating  process. 

If  pj.  is  the  only  unknown  parameter  and  the  prior  distri- 
bution is  natural  conjugate  to  the  process  then  tlie  posterior  dis- 
tribution is  f"(M|.|m",a'^/np  and  the  predictive  distribution  is 
^LN^^tl'^t'^Sld  +  n^^^)/n^^^]),  as  defined  in  (3.3.6)  and  (3.3.7). 

If  pj.  and  6   are  both  unknowi  and  the  prior  distribution  is  natural 
conjugate  to  the  lognormal  distribution  tlien  the  posterior  distribu- 
tion is  fM_--.(f't ''^^  l'"t  '  "t'  ^"  ^^'^   "^"-^  ^^^    ^^^   predictive  distribution 
is  logStudeiit  witli  infinite  mean  and  variance,  as  defined  in  (3.3.17)- 
(3.3.20). 
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Under,  both  uncertainCy  situations  the  posterior  distribution 
(or  the  prior  if  no  sample  evidence  was  included)  then  reflects 
whatever  is  known  concerning  the  parameters  of  interest  and  it  also 
fully  reflects  the  remaining  uncertainty  the  manager  has  concerning 
the  parameters.  A  large  part  of  the  statistical  problem  in  reliability 
involves  the  estimation  (point  or  interval)  of  parameters  in  failure 
distributions.  Each  of  tlie  non-Bayesian  methods  of  obtaining  point 
estimates  given  in  Chapter  Two  has  certain  statistical  properties  that 
make  it  desirable  from  a  theoretical  viewpoint.  From  the  Bayesian 
standpoint  the  posterior  distribution  should  be  used  to  derive  the 
point  or  interval  estimators  of  the  unkno\vm  parameters,  except  under 
nonstationarity  in  which  case  the  new  prior  should  be  used.  With 
respect  to  inferences,  the  manager  considers  the  entire  posterior  dis- 
tribution (or  any  probability  determined  from  this  distribution)  as 
an  inferential  statement,  and  he  may  not  be  interested  in  a  single 
point  estimate.  For  instance,  some  potential  estimators  of  p   based 
on  the  normal  posterior  distribution,  for  the  case  when  only  jj   is 
unknown,  are  the  posterior  mean,  the  posterior  median,  the  posterior 
mode,  and  so  on.  Since  the  normal  is  unimodal  and  symmetric,  the  pos- 
terior mean,  m" ,  is  equal  to  the  posterior  median  and  to  the  posterior 

mode.  On  the  other  liand  if  an  interval  of  values  for  fj    rather  than  a 

t 

single  value  is  desired  tlien  from  the  normal  posterior  distribution, 
the  probability  of  any  interval  of  values  of  \i      can  be  determined.  It 
was  sliown  in  Chapter  Three  that  the  presence  of  nonstationarity  produces 
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greater  uncertainty  (variance),  at  the  start  of  period  t+1,  with 
respect  to  the  unknown  parameter  than  would  be  present  under  sta- 
tionarity  because  in  the  stationary  case  n'  ,  =  n'' .  Thus  we  would 
expect  to  have  wider  intervals  for  a  given  y  content;  and  after 
several  periods  the  intervals  will  also  be  shifted  in  location 
since  they  will  differ  in  means. 

For  the  case  in  which  both  parameters  (jj   and  6^)  of  the 
lognormal  distribution  are  unknown,  some  potential  point  or  interval 
estimators  are  based  on  the  marginal  distributions  obtained  form  the 
joint  posterior  distribution  function.  In  any  given  period  t  if  the 
joint  posterior  distribution  of  the  unknown  parameters  of  the  log- 
normal  life  density  function  is  normal-gamma,  as  defined  in  Section 
3.3.2,  then  the  marginal  distribution  of  o  is  inverted-gamma-2^  as 
defined  by 

(5.3.1) 

2   exp[-d'V72a2]    [d"v"/2o2  ]  ^^^t/^)   +   1/2 
f.        o(okv",    d")    = ^-^ ^-^ ' 

l-Y-2         '         t  t  .  1/9 

^  r(d'72)   [d"v'72]  ' 

with  mean 

(5.3,2)        E(o|v'^,    d'^)    =      /VVd'^72      fd'72   -   3/2]  !/ [r'(d'72)  ] 


The   marginal    distribution   of    l/d^    is   garama-2, 
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and   variance 

(5.3.3)  V(0|v'j:,    dp    =    [v^M'^/(d'^'   -   2)]    -    [E(o)]2. 

The   cimiulative   density    function  of    the    inverted-gaTnina-2    is    related 
to    the   cumulative   density    function   of    the   gamiiia-2    variable    by 

(5.3.4)  G._^_2(o|/^,    dp    =   F^_2^1/e^iv;:.    d'^)  , 

00  a 

where  G(a)  =   /   f(x)  dx  and  F(a)  =  /   f(x)  dx,  [see  Raiffa  and  Schlaifer 

a  0 

(1961) J.  The  marginal  distribution  of  p   is  Student  as  defined  by 

f-,(0^|ni'J>  ii'J/v"  ,d'')  .  Point  or  interval  estimators  may  be  obtained  from 
(5.3.1)  or  from  the  Student  marginal  distribution  of  fl  . 

Sometimes  the  people  working  with  life  testing  models  are 
interested  in  the  distribution  of  the  median  and  of  the  mean  of  the 
lognorm;illy  distributed  variables.  The  median  and  the  mean  of  lognormally 
distributed  random  variables  are  given  by  C=  exp  (|j)  and  5=  exp(y+  a^ 1 2)  . 
For  given  period  t,  the  conditional  posterior  probability  density 
function  given  o  is  norma]  with  mean  m"  and  variance  a  /n",  then  C, 
given  T,  has  a  lognormai  posterior  probability  density  function.  The 
marginal  posterior  probability  density  function  for  jj   is  Student; 
thus  C  has  a  posterior  density  function  which  is  logStudent  [Zellner 
(1971)].  Similarly,  givi?n  o,  the  conditional  posterior  probability 
density  function  lor  5  is  lognormai.  Again  these  distributions  incor- 
porate all  the  available  prior  and  sample  information  and  can  be 
employed  to  oljtaiii  point  estimates,  to  make  probability  statements 
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about  parameter's  values,  to  perform  Bayesian  tests  of  hypotlieses  and 
to  derive  predictive  probability  distrlbut  ion.s  for  future  observations 
coming  from  the  lognormal  life  testing  model. 

As  discussed  in  Chapter  Four,  a  prediction  interval  is  dif- 
ferent from  a  confidence  interval  for  an  unknown  population  parameter 
(such  as  the  population  mean)  or  from  a  tolerance  interval  to  contain 
a  specified  proportion  of  the  population.  It  is  sometimes  of  interest 
to  obtain  a  value,  arrived  at  by  life  testing,  that  with  high  proba- 
bility will  be  less  than  the  life  length  of  a  particular  component 
that  is  to  be  used  in  a  one  trial  system.  In  many  practical  problems 
in  industry,  it  is  desired  to  use  the  results  of  a  previous  sample  to 
predict  the  results  of  a  future  sample.  For  example,  data  on  warranty 
values  on  engines  over  the  past  three  years  might  be  used  for  plan- 
ning purposes  to  obtain  limits  that  will  contain  the  warranty  in 
the  coming  year  with  a  high  probability.  Such  problems  can  be  handled 
by  Bayesian  prediction  intervals.  Prediction  intervals  are  also  of 
special  interest  to  engineers  who  are  concerned  with  setting  limits 
on  the  performance  of  a  small  number  of  units  of  a  product.  Such 
limits  would  be  required,  for  example,  in  setting  specifications  to 
contain  with  a  high  probability  a  critical  performance  characteristic 
for  all  units  in  an   order  of  three  heavy  transformers  when  the  only 
available  information  is  the  data  on  five  previous  transformers  of 
the  same  type.  By  using  the  limits  of  a  prediction  interval  as  speci- 
fication limits,  one  can  state  that  wi tli  a  specified  probability  all 
three  transformers  will  meet  specifications  [Hahn  and  Nelson  (1973)]. 
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Prediction  intervals  are  also  required  by  the  typical  customer  who 
purchases  one  or  a  small  number  of  units  of  a  given  product  and  who 
must  set  limits  on  the  performance  values  of  the  particular  units  he 
will  purchase. 

The  prior  mean,  at  the  beginning  of  any  period,  (m'   ) , 
under  nonstationarlty  can  be  expressed  as  the  sum  of  the  initial  mean, 
m'   discounted  by  a  factor  that  decreases  with  time,  and  an  exponentially 
weighted  sum  of  the  observed  sample  observations.  This  relationship 
provides  probably  the  most  interesting  aspect  of  the  nonstationarlty 
Bayesian  model,  particularly  for  tlie  life  testing  problem.  Since  most 
of  the  point  and  intervals  estimates  discussed  in  previous  paragraphs 
are  functions  of  n'       ,  then  they  are  also  unequally  weighted  functions 
of  past  data.  This  gives  a  Bayesian  interpretation  and  justification  for 
the  old  production  management  idea  of  exponential  smoothing  .  A  strong 
argument  is  made  that  since  the  most  recent  observations  contain  the 
most  information  about  what  will  happen  in  the  future  they  should 
be  given  relatively  more  weight  than  the  older  observations.  A  limi- 
tation in  exponential  smoothing  techniques  is  that  there  is  no  good 


The  exponentially  weighted  moving  average  forecast  arises  from 
the  following  moilel  of  expectations  adapting  to  changing  conditions.  Let 
y^.  represent  that  part  of  a  time  series  which  cannot  be  explained  by  trend, 
seasonal,  or  any  other  systematic  factors;  and  let  y   represent  the  fore- 
cast, or  expectation,  of  y.  on  the  basis  of  iiiformation  available  through 
the  (t-l)st  period.  It  is  assumed  that  the  forecast  is  changed  from  one 
period  to  the  next  liy  an  amount  proportional  to  the  last  observed  error. 
That  is.  y^^  =  y   ,  +  (•i(yj._^-  y,-_^)  ,  0<  B  <1.  The  solution  of  the  above 
difference  equation  gives  the  formula  for  the  exponentially  weighted 
forecast : 

y^  =  \->  T.   (l-ii)^"-^  y^_.  . 
1=1 
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rule  for  determining  tlie  appropiate  value  of  the  weights  to  be  assigned 
to  each  observation.   The  nonstationarity  Bayesian  model  provides  a  rule 
to  determine  the  set  of  weights  to  be  assigned  to  the  observations, 

5 . 4  Conclusion 

In  this  chapter,  Bayesian  models  for  Cost-Volume-Profit 
Analysis  and  for  life  testing  models  under  nonstationarity  have  been 
presented.  This  is  reflected  by  the  assignment  of  a  prior  distribution 
to  the  unknown  parameters,  which  recognizes  all  uncertainty  the  decision 
maker  has  concerning  the  parameters.  The  input  to  the  forecasting  model 
is  not  only  the  past  history  of  sales  of  the  item,  in  the  case  of  CVP 
analysis,  but  direct  Information  concerning  the  market,  the  Industry, 
the  economy,  sales  of  competing  and  complementary  products,  price  changes, 
advertising  campaigns,  and  so  on  are  used.  A  similar  amount  of  information 
is  incorporated  from  life  testing  models.  The  model  also  emphasizes  that 
such  a  model  ideally  should  include  the  changing  character  of  the  para- 
meters of  economic  and  life  distributions  by  allowing  for  changes  in  the 
parametric  description  of  the  process  through  time. 

For  the  case  of  nornial  and  lognormal  data  generating  processes, 
under  a  particular  form  of  stochastic  parameter  variation  it  is  shown 
that  the  presence  of  nonstationarity  produces  greater  uncertainty  to 
the  manager,  whi( h  is  reflected  in  these  particular  cases  by  an  increase 
in  a  particular  iiu^asuro  of  imcertainty,  variance.  Bayesian  methods  are 
used  to  derive  predictive  distributions  for  CVP  analysis  and  life  testing 
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models  allow  the  decision  maker  to  provide  probability  statements 
about  future  values  of  sales  and  future  life  length  of  items.  Estimates 
obtained  from  the  posterior  and  predictive  distributions  are  unequally 
weighted  functions  of  past  data. 


CHAPTER  SIX 

CONCLUSIONS,  LIMITATIONS,  AND  FURTHER  STUDY 
6 , 1  Summary 

Great  effort  has  been  expended  by  engineers,  econometricians 
and  statisticians  over  the  last  two  decades  on  the  problem  of  model 
identification.  This  problem  is  concerned  with  construction  of  a  model 
whose  output  is  close  in  some  sense  to  the  observed  data  from  the  real 
system.  The  equations  which  describe  the  model  are  often  specified  to 
within  a  number  of  parameters  which  must  be  estimated.  The  unknown  param- 
eters are  usually  assumed  a  priori  to  be  constant.  In  this  case  the  prob- 
lem of  model  identification  is  reduced  to  one  of  constant  parameter  esti- 
mation. 

The  problem  of  time  varying  parameters  has  received  more  atten- 
tion during  recent  years  because  of  an  increased  body  of  evidence  that 
the  usual  assumption  of  stable  parameters  often  lacks  realism.  The  sto- 
chastic parameter  variation  problem  arises  when  parameter  variation  in- 
cludes a  component  which  is  a  realization  of  some  random  process  in  ad- 
dition to   whatever  component  is  related  to  observable  variables,  Ideally, 
a  model  would  be  so  well  specified  that  no  stochastic  parameter  vari- 
ation would  be  present,  but  the  world  is  less  than  ideal, 

In  this  dissertation  we  extend  and  generalize  an  earlier  model 
developed  by  Winkler  and  Barry  (1973)  by 

1.  explicitly  accounting  for  uncertainty  with  respect  to  both 

parameters  of  the  Uayesian  normal  model, 
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and 

2.  model iag  nunstationarity  in  mean  and  vari;mce  for  the  log- 
normal  case,  since  the  mean  and  variance  of  the  lognormal  distribution 
are  both  functions  of  both  \^    and  o^  • 

Some  of  the  objectives  of  this  kind  of  research  are  to  gain 
more  precise  information  about  the  structure  of  economic  relationships 
and/or  to  obtain  estimated  relationships  that  are  suitable  for  forecast- 
ing, in  particular  in  the  areas  of  CVP  analysis  and  life  testing  models. 
Tlie  model  developeti  in  the  previous  chapters  seems  particularly  appropiate 
to  both  of  these  objectives,  because  it  [provides  a  framework  for  draw- 
ing inferences  about  the  structure  of  the  relatlonshi]^  at  every  point 
in  time. 

Comparing  the  nonstationary  model  with  the  stationary  one  it 
is  shovm  that: 

1.  more  uncertainty  is  present  under  nonstationarity  than 
under  stationarlty; 

2.  past  observations  provide  relatively  less  information  about 
the  current  values  of  p   under  nonstationarity  than  under  stationarlty 
because  the  particular  form  of  stochastic  parameter  variation  used 
implies  a  treatment  of  data  involving  the  use  of  all  observations  in 

a  differential  weighting  scheme; 
and , 

3.  under  nonstationarity  the  limiting  values  of  some  of  the 
parameters  of  the  [>osterior  and  predictive  distributions  cannot  be 
determined  clearly. 
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Tlie  model  developed  in  this  dissertation  is  simple  and  some 
of  the  results  arc  obtained  under  very  restrictive  assumptions.  F'robahly 
the  most  important  advantage  of  the  new  v;ork  is  the  increased  versatility 
it  lends  to  the  nonstationary  Bayesian  model  derived  by  Winkler  and  Barry 
(1973),  i.e.,  the  enlarged  range  of  real  and  important  problems  involving 
univariate  or  multivariate  nonstationary  normal  and  lognormal  processes 
v.'ith  which  it  can  cope.  Another  advantage  is  tliat  it  keeps  the  simplicity 
of  the  updating  methods  for  the  efficient  handling  of  the  estimation  of 
unknovm  parameters  and  the  prediction  of  the  outcome  of  a  future  sample. 

6 . 2  Limitations 

The  results  obtained  from  the  Bayesian  modeling  of  nonstationarity 
rely  on  some  general  and  simplifying  assumptions  that  we  have  pointed  out 
throughout  the  dissertation.  Some  of  these  assumptions  limit  the  results 
obtained  from  the  model.  These  are  assumjjtions  that  are  part  of  the  more 
general  Bayesian  statistical  inference  model  and  others  are  related 
directly  to  the  nonstationary  condition. 

The  decisions  we  make,  the  conclusions  we  reach  and  the  expla- 
nations we  offer  are  usually  based  on  beliefs  concerning  the  probability 
of  uncertain  events  su(.:h  as  the  result  of  an  experiment,  the  outcome  of 
a  sport  event  or  the  future  value  of  an  investment.  In  general,  we  do 
not  have  objectively  given  models  according  to  v;hich   the  probability  of 
such  events  could  he  computed.  As  a  consequence, the  assessment  of  uncer- 
tainty is  often  based  on  the  intuitive  judgments  of  lumian  beings.  One  .impor- 
tant assumption  ol  tlie  model  that  v/e  developed  is  that  t'ne  manager  can  express 
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his  judgments  about  the  unknown  parameters  in  terms  of  a  natural  conjugate 
prior  distribution  for  the  process,  Tlie  manager  has  to  decide  which  param- 
eters are  unknown  and  then  he  must  express  his  information  about  these 
random  variables  in  probabilistic  terms  and  according  to  the  natural  con- 
jugate family  of  prior  distributions.  The  prior  probabilities  should  reflect 
the  decision  maker's  prior  information  about  the  uncertain  quantity  in 
question,  i.e.,  sample  results  if  available  and  if  there  is  little  or  no 
sample  information,  then  they  should  be  based  on  any  other  relevant  infor- 
mation available.  Several  techniques  are  available  for  the  quantification 
of  judgment;  some  of  these  were  referenced  in  Chapters  Two  and  Five.  For 
many  problems,  a  joint  distribution  for  the  unknown  parameters  is  needed, 
If  the  uncertain  parameters  are  dependent,  the  assessment  process  becomes 
difficult, especially  if  we  are  dealing  with  continuous  random  variables. 

The  applicability  of  conjugate  prior  distributions  depends  in 
part  on  the  appliv-.abi  li ty  of  a  particular  statistical  model  because  the 
conjugate  family  of  distributions,  as  shov\7n  in  Chapter  Three,  depends  on 
assumptions  concerning  a  statistical  model.  Although  the  model  is  origi- 
nally developed  for  normal  data  generating  processes, several  references 
are  given  for  the  applicability  of  lognormal  models  to  economic  and  life 
testing  problems.  There  are  cases  in  which  even  if  a  certain  model  is  appli- 
cable to  the  dtita  generating  process  and  if  the  corresponding  conjugate 
family  is  known,  it  may  be  that  no  member  of  the  family  adequately  repre- 
sents the  assessoT''s  prior  juilgments. 
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For  some  of  the  results,  we  assumed  that  for  each  period  a  sample 
of  equal  size,  n,  is  available.  This  is  a  crucial  assumption  for  the 
limiting  results  discussed  in  Chapter  Four.  If  a  sample  of  equal  size 
is  not  available,  each  time  a  sample  is  taken,  then  the  limiting  value  of 
n'  cannot  be  obtained  without  some  further  restrictions  on  the  nature 
of  the  sampling  procedure  actually  used. 

The  imposition  of  the  transition  relation  p  ,,  =   p   +  ^^  , 
is  critical  to  the  determination  of  the  prior  distribution  of  the  time 
varying  coefficient.  We  assumed  that  the  distributions  of  p   and  e 
were  normal,  and  therefore  we  were  able  to  find  the  convolution.  It  is 
shown  in  Appendix  II  that  other  assumptions  like  gamma  or  exponential 
random  shocks,  non-additive  nonstationary  models,  i.e.,  P   ,  =  P  e^  ^ » 
and  exponential  data  generating  processes  can  lead  to  distributions  that 
are  not  tractable  and  consequently  not  useful  for  the  Bayesian  modeling  of 
time  varying  parameters.  It  is  also  assumed  in  this  model  that  no  sea- 
sonal or  trend  effects  are  present.  Insofar  as  the  model  is  used  for  short- 
term  forecasting,  this  assumption  does  not  seem  unrealistic.  Further   re- 
search including  these  additional  sources  of  variation  could  lead  to  a 
more  versatile  model,  although  problems  like  those  discussed  in  Appendix 
II  are  likely  to  reduce  the  possibilities  of  obtaining  a  model  in  closed 
form.  [See  Harrison  and  Stevens  (1971)  for  some  results  with  such  a  model.] 

The  assumption  that  the  variance  of  the  normal  process  is  known 
seems  particularly  unrealistic  when  we  are  assuming  tliat  the  mean  is  unknown. 
Thus, we  assumed  that  both  parameters  are  unknown.  However,  a  restrictive 
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assumption  has  to  be  iinposeil  in  order  to  permit  tlie  determination  of  the 

new  prior  distribution  after  a  random  shock  has  occurred,  i.e.,  that  the 

ratio   (n  )  between  the  unkno\<m  population  variance  (5  )  and  the  random 
s 


shock  variance  (d";)  is  known;  and  that  although  the  population  variance 
is  unknown  it  does  not  change  from  period  to  period.  This  is  a  strong 
assumption  and  the  revision  process,  vjhen  botli  parameters  are  unknown, 
depends  entirely  on  it.  Engineers  are  often  able  to  specify  these  param- 
eters from  direct  pliysical  information  but  econometricians  are  seldom 
so  fortunate,  and  the  identification  and  estimation  problems  are  much 
more  severe  in  an  economic  context. 

b . 3  Suggestions  for  Further  Research 
Tiie  transition  relationship  of  the  time  varying  parameter,  P  , 
was  assumed  to  be  constant  and  known  throughout  the  dissertation.  As  we 
said  before,  this  assumption  is  a  key  factor  in  the  determination  of  the 
prior  distribution  of  the  time  varying  coefficients.  The  choice  of  a 
particular  structure  reflects  the  decision  maker's  prior  beliefs  about 
the  relationship  between  consecutive  time  varying  parameters.  Some 
models  have  been  suggested  to  describe  this  relationship . If  there  is 
a  priori  a  belief  tliat  the  parameters  drift  slowly  across  time  the  f(3l- 
lowing  model  miglit  seem  more  appropiate  [see  Sarris  (1973)], 
rit--|--i  =    2  \i  -    f't-_i+  '-■t-4-1  •  ^'^^  model  developed  in  this  dissertation  could 
tie  extended  to  include  the  problem  of  identifying  and  selecting  the 
transition  structure. 
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We  have  assumed  that  a  change  in  the  process  mean  takes  place 
during  each  period  and  that  the  magnitude  of  that  change,  e,  has  a  distri- 
bution N(0,o'^).  Although  this  has  been  a  convenient  assumption,  it 

e 

perhaps  lacks  realism.  A  more  realistic  assumption  would  seem  to  be 
that  an  assignable  cause  (and  hence  a  change  in  the  process  mean) 
occurs  according  to  a  Poisson  process.  Carter  (1972)  approached  this 
problem  assuming  that  o'^ ,  the  population  variance,  was  known.  The 
methodology  described  in  this  dissertation  for  the  case  in  which  both 
parameters  are  unknown  could  be  used  incorporating  this  new  assumption 
to  the  problem. 

The  probler.i  of  nonstationarity  could  be  approached  from  a 
different  angle.  Suppose  tliat  the  time  varying  parameters  y   and  p 
are  independent  and  identically  distributed,  conditional  upon  some 
second  order  parameter (s) ,  instead  of  being  related  in  a  stochastic 
manner.  In  a  problem  like  this  the  decision  maker  is  making  inferences 
about  the  distribution  of  p  ,  which  sometimes  is  called  the  distri- 
bution of  nonstationarity.  For  instance,  if  y   is  the  mean  for  period 
t  of  a  normal  data  generating  process  for  sales  of  a  given  company, 
then  the  distribution  of  nonstationarity  might  represent  the  different 
values  of  M  over  time.  In  general,  the  distribution  of  nonstationarity 
will  have  a  parameter  (or  a  vector  of  parameters)  often  denoted  by  (f" , 
so  that  the  distribution  of  nonstationarity  can  be  represented  by 
f(Pj,|'||)  for  all  t.  A  Bayesian  approach  to  this  problem  requires  the 
specification  of  a  probability  distribution  for  f((t')  in   order  to  express 
the  decision  maker's  uncertainty  about  *)' .  This  problem  can  be  studied 
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under  various  uncertainty  and  distrilnitional  assumptions  concerning  tlie 
distribution  of  nonstat ionari ty  and  concerning  the  distribution  of  second 
order  parameters,  i.e.,  f  (<()).  This  problem  is  related  to  the  problem  studied 
by  a  class  of  theorists  known  as  Empirical  Bayesian  [see  Maritz  (1970)]. 

Another  application  of  the  model  developed  in  this  dissertation 
relates  to  calibration  of  Instruments.  Suppose  that  a  product  is  being 
weighed.  During  period  t  a  sample  is  taken  to  estimate  the  average  weight 
of  the  products,  p  .  As  the  average  appears  to  be  high  or  low,  a  dial  can 
be  set  to  increase  or  reduce  the  average  weight  of  the  products  by  an 
amount  c  .  If  we  assume  that  the  dial  is  poorly  calibrated,  i.e.,  e   be- 
comes a  random  variable,  then  when  we  change  the  dial  we  do  not  get 
V      "*"  '\-  but  rather  M   +  e  ,  where  E(e  )  =  e  .  Since  the  setting  varies, 
£   will  vary  and  hence  the  expected  mean  weight  of  the  products  for  the 
next  period  of  time,  ECfi   ,)  will  vary.  The  expected  value  of  e  ,  e  , 
might  be  subject  to  control,  so  that  a  decision  problem  arises.  Each 
period  of  time  a  setting  must  be  selected  tliat  minimizes  the  variance 
of  the  average  weight  or  that  minimizes  the  predictive  variance  of  the 
weight  for  a  f utut e  product  that  is  sampled,  or  that  satisfies  a  proba- 
bilistic constraint  on  the  next  weights  of  items  produced  by  the  process. 

Perhaps  the  most  important  area  for  further  work  has  to  do  with 
identification  of  the  nonstat ionarity ,  We  have  stressed  throughout  the 
dissertation  that  it  is  important  for  the  decision  maker  to  recognize 
Che  presence  of  nonstationar i ty  if  it  exists.  However,  most  of  the  time 
it  is  very  difficult  to  get  information  about  the  general  form  of 
nonstationarity .  Analyzing  data  for  evidence  of  changes  in  [larameter 
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conditions  is  a  problem  central  to  the  development  of  an  inferential 

system  that  the  decision  maker  can  use.  The  decision  maker  has  available 

the  sequence  of  sample  means  m^ ,  m2 ,  ...  ,  m^. ,  ...  .  More  research  is 

needed  to  find  out  how  those  sample  means  could  be  helpful  in  determining 

what  form  of  nonstationarity  is  present  and  what  its  variability  is.  In 

the  previous  section  we  pointed  out  that,  when  the  parameters  ]s   and  o^ 

are  unknovm,  our  model  depended  on  the  assumption  that  the  ratio  (n  .) 

between  the  unknown  population  variance  and  the  random  shock  variance 

is  known.  In  most  cases  the  decision  maker  does  not  know  this  value 

and  needs  to  estimate  it.  Additional  research  is  required  to  find  out 

how  to  use  the  sample  means  and  the  sample  variance  to  estimate  n  . 

s 

In  conclusion,  since  assumptions  of  stationarity  are  often  quite 
unrealistic,  the  introduction  of  possible  nonstationarity  greatly 
increases  the  realism  and  tlie  applicability  of  statistical  inference  ' 
methods,  in  particular  of  Bayesian  procedures.  More  work,  of  both  an 
empirical  and  analytical  nature,  appears  to  be  promising. 
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APPENDIX  TO  CHAPTER  THREE 
Bayesian  Analysis  of  Normal  and  Lognormal  Processes 
The  general  Bayesian  theory  presented  in  subsection  3.2.1  pro- 
vides the  foundation  for  the  analysis  of  normal  and  lognormal  processes  to 
be  considered  in  this  appendix.  Most  of  this  \\;ork  appear  in  Raiffa  and 
Schlaifer  (1961)  and  in  De  Groot  (1970).  It  sets  the  stage  for  our  analysis 
of  normal  and  lognormal  processes  under  nonstationarity  in  section  3,3. 
Two  uncertainty  conditions  are  to  be  studied  in  detail  ;  in  one  the  shift 
parameter,  y,  is  unknown  and  the  spread  parameter,  a^,  is  assumed  to  be 
known,  and  in  the  other  case  both  parameters  are  assumed  to  be  unknown. 
Prior,  posterior  and  predictive  distributions  will  be  determined  for  both 
cases.  In  every  case  sufficient  statistics  will  be  found  for  the  unknown 
parameters. 

I . 1  Normal  and  Lognormal  Processes  with  Known  Spread  Parameters 

The  purpose  of  an  experiment  is  to  obtain  information  about  p  or 
o  ,  depending  upon  which  (if  either)  is  known  beforehand.  Consider  experi- 
ments consisting  of  n  independent  and  identically  distributed  observations 
x  ,  x„,  ...  ,  X   obtained  from  a  normal  process;  that  is  a  process  gen- 
erating random  variables  x  ,  x  ,  ...  ,  x  with  identical  densities 

(AT.l)   fj^(x|M,a2)   =  {/l^  a)~^    exp  [-(x-p)  2/2a2] ,   -«><  x  <"  , 

—  OD<   y   <co   ^ 
O  >   0. 
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The  likelihood  thaL  an  Tiuiependent  Normal  process  will  generate  n 
successive  values  x,,  x^,  ...  ,  x   is  the  product  i)f  tlieir  individual 
likelihoods  as  given  tiy  (Al.l)  if  the  stopping  process  is  noninfor- 
mative.  (See  La  Valle  (1970)  for  a  general  discussion  of  stopping 
rules.)  In  other  words  it  is  the  product  of  their  individual  like- 
lihoods if  the  kernel  of  the  likelihood  function  for  tlie  parameter 
depends  only  on  the  data  generating  process  and  not  on  the  stopping 
process.  We  will  assume  that  the  stopping  process  is  noninf ormative. 
Therefore  the  likelihood  could  be  written  as. 


n 


(AI.2)         l(^:|fi,a^)    =        ][      {[/Trr'o]    ^    exp  [-(x . -p)  2 /2o2  ]  ) 

i  =  l 

or 

(AI.3)  =[/2^   or"   exp{-[    E      (x.-p)2]/2a2}    . 

i=l 

If  we  assume  that  jj  is  unknown,  then  we  can  compute  the  statistic 

m  defined  as 

n 
(A1.4)       m  =  (  T.      x^)/n. 

i=l 

The  likelihood  can  be  written  as, 

(AI.5) 

l(x|p)  =  (/2^  a)~"  (exp{-[  Z      (x .-m)2 /2o2 ] } )  exp [-n(m-p)2/2o2 J 

i=l 

(A1.6) 

a  exp[-n(m-p)-/262 ] . 
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Thus  all  the  information  in  the  sample  is  conveyed  by  the  statistics 
m,  sample  mean,  and  n,  sample  size.  Since  the  data  enter  Bayes'  for- 
mula only  through  the  likelihood,  it  follows  that  all  other  aspects 
of  the  data,  with  the  exception  of  m,  are  irrelevant  in  determining 
the  posterior  distribution  of  \i   and  hence  in  making  inferences  about 

y- 

Raiffa  and  Schlaifer  (1961)  show  that  when  the  variance,  o^, 
of  an  independent  normal  process  is  known  but  the  mean  is  treated  as 
a  random  variable,  the  most  convenient  distribution  of  y,  the  natu-- 
ral  conjugate  prior,  is  the  normal  distribution  defined  by 

(AT.  7)   tj^(plm,a'2)  =  {exp  [-(p-m)  2/2o' 2]  }/a' 2/2T,   — <  y  <-, 

a'2  >  0. 
In  the  particular  case  of  an  unknown  mean,  the  likelihood  of  p  is 
a  normal  curve  completely  known  a  priori  except  for  location,  which 
is  determined  by  m.  That  is,  the  likelihood  is  data  translated  in 
the  original  metric  p  and  therefore  a  noninf ormative  prior  is 
locally  uniform  in  p  itself, 

To  simplify  our  results,  let  o'^  =-o2/n';  that  is  we  define 
the  parameter  n'  by 

(AI.8)       n'  =  o2/o'2 

and  say  that  the  information,  (m,o'2),  contained  in  the  prior  dis- 
tribution of  fi  is  equivalent  to  n'  observations  on  the  process.  In 
othei-  words  let  the  prior  distribution  be 
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(AI.9)   fjljC  M  |m,a2,n)  =  {expl-n(  u-m)2/2o2  ]]/a/2u/n   . 

If  a  normal  distribution  with  parameters  n'  and  m'  is  assigned  to  u 
and  if  a  sample  then  yields  a  sufficient  statistic  (m,n)  then  the  pos- 
terior distribution  of  y  will  be  a  normal  distribution  with  para- 
meters, 

(ALIO)     m"  =  (n'm'  +  nra)/(n'+  n) 
and 

(AI.ll)     n"  =  n'  +  n   . 

It  can  be  seen  in  (ALIO)  that  m"  is  the  weiglited  average  of  the  prior 
and  sample  means.  Therefore,  we  may  conveniently  regard  the  mean  of 
the  posterior  distribution  as  a  weighted  average  of  an  estimate  of  p 
formed  from  the  sample  and  an  estimate  of  u  formed  from  the  prior 
distribution.  The  weights  of  m  and  m'  in  this  weighted  average  are 
proportional  to  n'  and  n.  If  n'>n,  the  prior  mean  is  given  more  weight, 
and  the  posterior  mean  m"  is  closer  to  m'  than  to  m.  If  n'<n,  the 
sample  mean  is  given  more  weight,  and  ra"  is  closer  to  m  than  to  m' . 
Since  we  stated  that  n'  could  be  thought  as  the  sample  size  required 
to  produce  a  variance  of  o'^  for  a  sample  mean,  m,  it  appears  that 
in  pooling  tlie  two  samples,  the  one  with  the  larger  sample  size 
receives  more  weight  in  the  determination  of  the  weighted  mean. 

The  form  of  tlie  variance  of  the  posterior  distribution  of  \i 
Ls  particularly  simple.  Tlie  denominator  of  the  variance  expression 
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increases  by  a  constant  amount  with  each  observation  that  is  taken, 
regardless  of  the  observed  values.  Therefore  as  the  number  of  obser- 
vations increases,  the  distribution  of  p  becomes  more  concentrated 
around  its  mean.  Moreover,  the  concentration  must  increase  in  a 
fixed,  predetermined  way,  while  the  values  of  the  expectation  of  the 
mean  will  depend  on  the  observed  values. 

If  the  n  random  variables  x^ ,  ...,  x   represent  a  random 

1        n    '^ 

sample  of  size  n  from  a  normally  distributed  population  with  mean  M 

o 

and  variance  o  ,  then  the  sample  mean  m  is  normally  distributed  with 
conditional  mean  E(m|y,a'^)  =  p  and  conditional  variance  V(m|  p  ,o^  )=a'^/n. 
Since  the  variance  of  tlie  prior  distribution  is  equal  to  o^/n'  and 
the  variance  of  the  sample  mean  is  equal  to  o^/n,  we  notice  that,  in 
the  posterior  distribution,  the  prior  information  receives  more  weight 
than  the  sample  information  if  the  prior  variance  is  less  than  the 
variance  of  m  (i.e.,n'<n).  If  there  is  more  prior  information  than 
sample  information  (vvrhere  information  can  be  viewed  as  inversely 
related  to  variance) ,  then  the  posterior  distribution  is  affected 
more  by  the  prior  information,  summarized  by  the  prior  distribution, 
than  by  the  sample  information,  summarized  by  the  likelihood  function. 

To  find  the  predictive  distribution  for  the  case  under  con- 
sideration, we  have  to  evaluate  the  expression 

;a1.12)      f(x)  =  _r    fj^(xlp,o2)   f;^(p|m",n")  dp  . 

Substituting  into  the  expression  the  corresponding  functions 
and  integrating  out  a^ ,  Raiffa  and  Schlaifer  (1961)  show  that  the 
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predictive  distribution  function  is  a  normal  distribution  with  mean 
m"  and  variance 

(AI.13)      o-Xn"  +  l)/n"   =  o"'  +   (o^/n")  . 
Thus  the  predictive  variance  reflects  both  the  process  variance  ^ 
and  uncertainty  about  \i   measured  by  o^/n". 

We  are  also  interested  in  studying  experiments  consisting 

of  n  independent  and  identically  distributed  observations  x  ,  x„ x 

12        n 

obtained  from  a  lognormal  process,  that  is  a  process  generating  random 
variables  x  ,  x  ,  ...,  x   vs^ith  identical  densities, 

(A1.14)         f      (x|p,o^)    =    {exp[-(ln   X   -&  )^/2a2]  )/xo    /2tt   ,         x    >  0, 
LN 

_  oo<(j<aj 
^  > 

o    >  0. 

It  was  stated  in  Chapter  Two  that  a  random  variable  x  is  said 

to  be  lognormal  if  and  only  if  In  x  is  normal.  That  is,  suppose  that 

In  X  is  normal  with  unknown  mean  jL  and  known  variance  a^ .    Denoting 

by  f^,(ln  X 1  u,  ,0^-)    tlie  value  of  the  normal  density  at  In  x  and  by 
N      '  Ij   1. 

f   (x|vL  ,0"^)  the  value  of  the  lognormal  density  at  x,  it  follows  that 
(AI.15)      ^LN^^l^'^^L^  =  ^N^^"  x|wj^,a2)/x  . 


Thus  working  in  terms  of  the  variable  In  x,  the  preceding  analysis 
in  terms  of  the  normal  process  can  be  applied  to  obtain  results  that 
apply  to  tlie  lognormal  distribution. 

When  it  Js  assumed,  in  a  lognormal  distribution,  that  o  is 
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kno\m  and  that  p  is  unkno^sm  it  follows  that  the  sufficient  statis- 
tics are, 

(AI.16)      m  =   (  ?   In  X  )/n   ; 

i=l      1 

and  n.  The  natural  conjugate  prior  for  the  unknown  parameter  y  is 
normal  v/ith  parameters  m'  and  n'  as  in  the  normal  case.  The  revision 
of  the  prior  distrihution  is  similar  to  the  normal  case  also.  If  a 
lognormal  distribution  with  parameters  m'  and  n'  is  assigned  to  y 
and  if  a  sample  then  yields  a  sufficient  statistic  (m,n)  then  the 
posterior  distribution  of  y,   will  be  normal  with  parameters, 

(AT. 17)      m"   =    (n'm'  +  n  m) / (n '  +  n)  , 
and 

(AI.18)      n"   =    n'  +  n  . 
The  predictive  distribution  will  be  lognormal  with  parameters  m"  and 
o2(n"  +  l)/n". 

I . 2  Normal  and  Lognormal  Processes  with  Both  Parameters  Unknown 

We  shall  now  consider  the  important  problem  of  sampling  from 

a  normal  distribution  for  which  both  mean  and  variance  are  unknown. 

A  conjugate  family  for  this  problem  must  be  a  family  of  bivariate 

distributions.  Suppose  that  x^ ,  x„ ,  . . . ,  x   is  a  random  sample  from 

i    2        n 

a  normal  distribution  with  an  unknown  value  of  the  mean,  y,  and  an 
unknown  value  of  the  variance,  a^ .  The  likelihood  that  an  independent 
normal  process  will  generate  such  a  sample  is  given  in  (AI.3),  if  the 
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stopping  process  is  noninf ormative .  Now,  if  we  define  the  statistics, 

n 

(AI.19)     111   =   (  F,   x.)/n, 

i=l   ^ 
and 

n 
(AI.20)     V      =      (    I      (x.-m)-)/(n-l),      (  E  0  if  n=l) , 

i=l    ^ 

the  likelihood  (AT . 3)  could  be  rewritten  as, 

(AI.21) 

-n  -n 

l(x|p,o?)   =   (2it)"  (exp  [-{(n-l)v/262}  _  {n(m-P)  ^/2a^ }] }  (6)^. 

All  the  information  in  the  sample  is  conveyed  by  the  statistics  m,v, 

and  n;  i.e.,  (ni,v,n)  is  sufficient.  The  kernel  of  tlie  likelihood  is 

-jn 
(A1.22)    {exp[-{(n-l)v/2d2}  -  {n(m-u)'^ /2d'']]}    (o)^. 

Raiffa  and  Schlaifer  (1961)  show  that  under  these  assumptions, 
the  natural  conjugate  family  of  prior  distributions  for  the  two  random 
variables,  p  and  6",  is  a  normal-gamma-2  distribution  defined  by 

(AI.23)   Fj^_-^_2(P'^^U.v,n)  =  f^(p|d2,m,n)  f^_2(o2  |  v,n)  ; 

that  is 
(AI.24) 

e   2o'   ^v(n-l)j  2     ^(n-l)v^ 

fN-Y-7^^'"^'"''^'"^  =  I  /^^exp[-n(P-m)2/2n2l}{ ^"^  _, ~— 

2110 2  r  ("-^i) 

-^<  p  <-., 
n,v  >(). 
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They  recommend  also  that,  in  order  to  improve  the  richness  of  the  prior 
joint  distribution  we  could  define  a  new  parameter,  d,  which  could  be 
called  the  number  of  degrees  of  freedom  in  the  statistic  v.  It  does 
not  have  to  be  equal  to  n-1  since  f^,(p  d  ,m,n)  and  f   „(()""  |v,d)  are 
distinct.  The  prior  joint  distribution  could  then  be  defined  as 

(AI.25)   1\,   (M,a2|m,v,n,d)  =  f^,(p  |  o^  ,m,n)  f   „(a^|v,d). 
N-y  N   '  y-z     ' 

We  want  to  point  out  that,  if  we  are  concerned  with  noninfor- 
matlve  prior,  then  in  order  to  find  this  tractable  prior  distribution  a 
metric  log  a  and  not  o  should  be  used,  In  other  words  the  metric  (trans- 
formation) log  o  permits  us  to  have  a  prior  distribution  of  p  and  a^    that 
is  locally  uniform  (noninformative)  with  respect  to  the  likelihood.  How- 
ever, there  is  not  such  a  restriction  when  we  are  working  with  informative 
priors. 

Next  we  want  to  present  the  marginal  distributions  of  5^  and  p 
since  we  will  make  use  of  them  in  Section  3,3,  where  we  develop  a  model 
for  nonstationari ty  in  normal  and  lognormal  processes.  If  the  joint  dis- 
tribution of  the  random  variables  (p,o^)  is  normal-gamma-2  as  defined 
before.  Box  and  Tiao  (1972)  show  that  the  marginal  distributions  of  o^ 
is  gamma-2  with  parameters  v  and  d,  that  is 

(AI.26) 

d 

2    -1 
f      2^"^'^'^^^    =   {exp[-dv/2d-]}       [vd/2a2]  [dv/2]/r  (d/2)     ,    d2    >    0, 

v,d   >    0. 

Also  they  show  that  the  marginal  distribution  of  p  is  the  Student  dis-" 

tribution  with  parameters  (m,v,d,n),  that  is 
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(AI.27) 

d 


d+1 


l'5,j^,(li|m,n/v,d)  -  d^  [d+ln(lj-m)2/v}]  ^   /n/v/3  ( i  , d/2)  ,   —<  y  <-, 

d,(n/v)>0 
where  3(p,q)  Is  the  complete  beta  function. 

If  a  sample  yields  a  sufficient  statistic  (m,v,n,d)  and  a 
normal-gamma-2  prior  with  parameters  (m' ,v' ,n' ,d ' )  is  assigned  to 
p  and  o^  then  the  posterior  distribution  will  be  normal-gamma-2 
with  parameters  ni" ,  n" ,  d",  v"  given  by 

(Al,28)       m"  =  (n'm'  +  n  m)/(n'  +  n)  , 
(AI.29)       n"  =  n'  +  n  , 


(AI.3n)       d"  =  d'  +  n  , 


and 


(AI.31)       v"  =  (d'v'  +  n'm'-+  dv  +  nm2-n"m"2) / (d '  +  n) . 


To  find  the  predictive  distribution  of  the  random  variable 
X,  we  have  to  evaluate  the  expression 

(AI.32)   f(x)  =   r  r  f^,(x|p,o2)  f-;     (vi,o2|m",n",d",v")d,i  da 
0  -°^ 

Substituting  the  corresponding  functions  into  the  expression  and  inte- 
grating out  p  and  0-'  Kaiffa  and  Schlalfer  (1961)  show  that  the  pre- 
dictive d i str 1  but  ion  is  a  Student  dist rlbutloa,  defined  as 
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^AT.33)        d" 


2   r   M.     „^9,_.,...n,.^.^-(d"  +  l)/2 


,  ,      (d")^  [d"+[n"(x-m")2/v"(n"+l)]]  ^     "         /  n"     ,   — <  x  <-, 

^^  B  [(l/2),(d'72)]  V'(n"+1) 

d".    n"    >0. 
v"(n"+l) 


The  prior-posterior  analysis  of  the  lognormal  distribution  un- 
der the  assumption  that  both  parameters  are  unknown  is  very  similar 
as  we  mentioned  before  to  its  normal  counterpart.  The  sufficient  sta- 
tistics are 

n 

m  =    (  E   In  x.)/n  , 
i=l     ^ 

n 
V   =    (  Z   (In  X.  -  m)2)/(n  -  1)   , 
i=i 

and      n. 

The  natural  conjugate  prior  distribution  for  both  unknown  variables 
is  the  normal-gamma-2  as  defined  in  (AT. 25),  and  the  marginal  distri- 
butions are  gamma-2  and  Student  for  the  parameters  5^  and  jj  respec- 
tively. A  posterior  analysis  will  lead  us  to  a  norraal-gamma-2  poste- 
rior distribution  with  parameters  revised  as.  in  (AT. 28  -  AI.31).  The 
predictive  distribution  of  In  x  is  Student  and  hence  the  predictive 
distribution  of  x  is  logStudent.  Ohlson  (1977)  shows  that  if  the 
logarithms  of  the  values  of  a  random  variable  follow  a  t-model,  then 
the  expected  value  and  the  variance  are  infinite.  Thus  tlie  predictive 
distribution  of  x,  in  our  case  where  both  parameters  are  unknov^m,  has  . 
Infinite  mean  and  variance.  In  Chapter  Four  we  will  discuss  the  impli- 
cations of  these  properties  for  our  statistical  inferential  model. 
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APPENDIX  II 

APPENDIX  TO  CHAPTER  THREE 
Nonstatlonary  Models  for  the  Exponential  Distribution 

In  Chapter  Two  we  pointed  out  that  the  exponential  distribu- 
tion was  frequently  used  to  represent  life  testing  models.  All  the 
research  in  the  area  of  life  testing  where  the  distribution  has  been 
used  has  assumed  stationary  conditions  for  the  parameters  of  the 
model.  We  wanted  to  model  nonstationarity  for  this  distribution  using 
two  different  noise  models,  but  it  proved  to  be  fruitless.  Only  under 
very  trivial  assumptions  did  the  analysis  yield  tractable  results.  For 
the  more  interesting  and  realistic  assumptions,  we  will  show  in  this 
appendix  that  useful  results  cannot  be  developed.  In  particular  these 
two  noise  models  were  considered:  one  assumes  that  the  value  of  the 
parameter  of  interest,  say  A,  at  time  period  t+1  is  equal  to  the  value 
at  time  t  plus  a  random  term,  i.e., 

(AII.l)       A^_^^  =   A^  +  e^.^^,        t  =  1,  2,  ...  ; 
the  other  noise  model  assumes  that  the  value  of  the  parameter  A  at  time 
period  t+1  is  equal  to  the  value  at  time  t,  tim.es  a  random  term,  i.e., 

(All. 2)       \^_^^   =   He^,^^,  t  =  1,  2,  ...  . 

Consider  experiments  consisting  of  n  independent  and  identically 
distributed  observations  x, ,  X2 ,  ...  ,  x   obtained  from  an  exponential 
process;  that  is  a  process  generating  random  variables  x,  ,  x^,  ...  ,  x 
with  identical  densities. 
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(AIL. 3)       f  (x|a)  =   A  exp(-Ax),      x  >  0, 

A  >  0. 
When  the  stopping  process  is  noninformative  the  natural  conjugate  prior 
distribution  of  tlie  unknown  parameter  A  is  the  gamma  distribution  with 
parameters  a  and  b;  i.e., 

(All. 4)       f  (A|a,b)  =   a"""^  exp[-A/B]/r(a)b'^,     0  <  A  <  b, 

a  >  0, 
b  >  0. 
In  any  given  period  t,  with  the  prior  on  A   and  with  the  sufficient 
statistics  from  the  sample  we  could  find  the  posterior  distribution  on 
A  ,  which  will  be  a  gamma  with  parameters  a"  and  b".  At  the  end  of 
period  t,  if  there  are  nonstationary  parameters,  we  use  the  posterior 
distribution  on  A   and  the  relation  between  A^^  and  e^,-,    to  get  the 
prior  distribution  of  tlie  unkno;-m  parameter  at  the  start  of  the  next 
period. 

Assume  that  a  gamma  random  shock  is  imposed  on  the  unknown 
mean,  A,  of  the  exponential  data  generating  process;  that  is 

(All. 5)      f^(ela,3)  =  e""^  exp[-e/3l /r(a) 3"    0  <  e  <  3, 

a  >  0, 
3  >  0. 
Furthermore  assume  that  equation  (AII.l)  describes  the  nonstationary 
random  shock.  Two  cases  are  worthwhile  to  look  at  under  this  scenario-; 
in  the  first  case  vje  additionally  assume  that  the  scale  parameters 
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are  equal , i . e. ,3=b;  and  in  the  other  case  we  make  no  restrictions 
whatsoever  in  relation  to  the  parameters.  Clearly  case  one  is  a  special 
case  of  case  two. 

Mood,  Graybill  and  Boes  (1974)  state  that  if  T   and  T   are 
independent  continuous  random  variables  and  if  z  =  T,  +  T,^,  then  the 
convolution  has  a  density  function  given  by, 

OO 

(All. 6)      f(z)  =   /   f^  (z-T  )  f   (T  )  dT 

Since  X  and  e  are  necessarily  positive,  the  convolution  of  them  will 
have  a  density  function 

(ATI. 7)      g(z)  =   /^  f  (z-Ala,e)  f^  (A|a,b)  dA  . 

0    e     '       A    ' 

But  in  case  one,  the  scale  parameters  are  assumed  to  be  equal  to  a 
constant,  say  to  c.  Thus  equation  (All. 7)  becomes 


(All. 8) 


'^     ,         .   nC-I       r   /    ,N  /   -.      r   ,  /   -,   ^-1 


g(z)   =    /   (z-A)"  ^  exp[-(z-A)/c]exp[-A/c.]A"  Vr  (a)c^r  (a)  c"*  dX  . 
0 

Since  z  is  fixed  and  A  cannot   be  greater  than  z  we  could  define  a 
new  variable, 

(All. 9)     A  =  uz  0  <  A  <  z   , 


or 


(All. 10)    u  =  >/z  ,  .  0  <  u  <  1   . 
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Substituting  (All. 10)  in  equation  (A1I,8)  and  simplifying,  g(z) 
becomes, 
(All. 11) 

Thus  the  prior  distribution  of  the  mean  at  the  beginning  of 
time  period  t+1  is  gamma  again  with  parameters  (a'    =  a"  +  a;  c) . 
However,  the  assumption  that  the  scale  parameters  are  equal  makes  this 
result  not  very  useful.  It  is  much  more  reasonable  to  think  that  the 
distributions  of  A   and  of  e    have  not  only  different  parameters  a 
and  a  but  also  that  they  have  different  scale  parameters  b  and  3. 

The  convolution  z  of  the  random  variable  A,  given  by  equation 
(All. 4),  and  the  random  variable  e,  given  by  equation  (AIT. 5),  when 
all  the  parameters  are  different  could  be  written  as 

(All. 12) 

g(z)   =    /^  (z-A)'^"-^  expl-(z-A)/3]A''"^exp(-A/b)/r(a)3'^r(a)b''  dA, 
0 

or 

(All. 13) 

exp  (-z/g) „       _T  _l 

^^^■*  ^   r(a)b''r(a)b''    ^      ^^~^^°   exp(A[(l/y)-(l/b)])A''    dA. 

Gradshteyn  and  Ryshik  (1965)  show  that 
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(ATT. 14) 

J      X    (u-x)     L'xp(|,x)  dx  =   3(u,v)  u        I"  (v;;i+  v;[3u), 
0 

where  BCy^^Yi)  is  the  heta  function, 

and    F  (v,p+  v,(ju)  is  a  degenerate  hypergeometric  function 

wh i ch  does  not  have  a_  closed  form.  Substituting  (All. 14)  in  equation 
(ATI. 13)  yields 

(All. 15) 

exp[-A^^^/,]A;;^-^  ,F^(a,a^;[(l/3)-(l/b)]A^^^)   ^ 

g(z)  =  g(>^,+i)  = :; 

'^^  e"b"r(a+a) 

It  is  clear  from  expression  (All. 15)  that  we  cannot  have  a  tractable 

expression  to  work  with  in  future  periods.  Furthermore  if  we  assume 

that  at  the  beginning  of  period  t+1 ,  the  random  variable  A     has  a 

density  function  of  the  form  given  by  (All. 15),  and  if  in  addition  we 

assume  that  new  information  is  available  that  comes  from  an  exponential 

process,  then  the  posterior  distribution  cannot  be  shown  to  be  of  the 

form  (All. 15) 

The  previous  analysis  assumed  that  the  random  shock  model  was 

of  the  form  (All.l),  that  is  A   ,  =  X  +  e    .  If  we  assume  now  that 

equation  (ATI. 2)  describes  the  nonstationarity  condition  on  the  mean 

of  the  data  generating  process,  i.e.,  A  .^  =  A  e  ,^,  then  we  could  show 

'   t+1     t  t+1 

that  even  in  tlie  simple  case  where  both  scale  parameters  have  a  value 
of  one  we  could  not  find  tractable  results.  In  any  given  period  t. 
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assume  that  the  posterior  distribution  of  X  is  given  by  (All. 4)  and 
that  the  distribution  of  e  is  given  by  (ATI. 5).  Mood,  Graybill  and 
Boes  (1974)  state  that  for  two  independent  continuous  random  variables, 
X  and  y,  the  distribution  of  their  product  z,i.e,,z  =  xy,  is  given  by 

(All. 16)    f(z)   =    /"  {f   (x,z/x)/|xl}  dx   . 

-00    xy        '  ' 

Hence  since  A  is  positive,  the  distribution  of  the  product  of  the 
posterior  of  X      and  the  nonstationary  random  shock  6     is  given  by 
(All. 17) 

f(2)   =   /"  {A^"^exp(-A)  [z/A]'*"-^exp[-z/A]/|A|r(a)r(a)}  dA  , 

0 

or 

a-1 
z  _ 

(All. 18)    f(z)  =  ^,  .^,    .         r   A^  "  -'exp[-A-(z/A)]  dA  . 

r^a)r(a) 

Gradshteyn  and  Ryshik  (1965)  state  that 

(All. 19)  r   x''~^exp[-(e/x)-Yx]  dx  =  2(B/y)'"'^  K  (2/^7)  , 
0  ^ 

where  K   is  a  Bessel  function  of  imaginary  argument.  Thus  using  the 

relation  (All. 19)  in  (All. 18),  f(z)  becomes 

(All. 20)    f(z)  =  z^~^   ItS^"^^''^   K  (2/^)/r(a)r(a)   . 

v 

This  shows  that  even  for  the  simple  case  where  g=b=l,  the  results  are 

not  tractable.  Additional  problems  of  interest,  like  those  studied  in 

the  previous  section,  present  additional  complications. 

Instead  of  assuming  a  gamma  random  shock  we  could  assume  an 
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exponential  random  shock  to  model  nonstat lonary  means  in  the  data 
generating  process.  Consider  samples  of  n  independent  and  identically 
distributed  observations  x  ,  x^,  ...  ,  x   from  an  exponential  process 
as  defined  in  (All. 3).  Assume  a  gamma  prior  distribution  for  the 
unknown  parameter  A  as  defined  in  (All. 4)  and  that  an  exponential 
random  shock  is  imposed  on  the  unknown  mean  X,    i.e., 

(All. 21)    f  (6|it,3)  =   a  exp[-ae],         0  <  e  <  g  , 

a  >  0, 
B  >  0. 
If  the  equation  that  describes  the  nonstationary  condition  of  the 
mean  is  (AII.l)  then  two  cases  are  relevant  for  analysis:  in  one 
we  assume  that  a=l/b  and  in  tlie  other  we  do  not  make  assumptions 
about  the  parameters.  When  we  assume  that  a  and  1/b  are  equal  to  a 
constant,  say  w,  then  cinivolution  z  of  the  random  variables  A  and 
e  has  a  density  function  given  by 

(All. 22)    f(z)  =   /^  {w  exp[-w(z-A)]A''~''"  [exp(-Aw)  ] w''/r(a)  }dA, 

0 

or  integrating, 

(All.  23)    f(z)  =   w"""*"^  [exp(-wz)]z''/r(a)  a  . 
If  we  define, 

(All. 24)      w  =  1/d, 
and 

(All. 25)      c  =  a  +  1 
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and  substitute  them  in  (AII23),  the  density  of  z  becomes 

(All. 2b)     f(z)  =  exp[-z/d]  z''~Vd''  r(c)  , 

vv^hich  is  easily  recognized  as  a  gamma  distribution  with  parameters  c 
and  d. 

If  we  do  not  make  assumptions  about  the  parameters,  the  convo- 
lution z  has  a  density, 

(All. 27) 

f(z)  =   /  {a  exp[-a(z-A)]  A^"-*"  exp(- A/b)  }/ r(a)b'^  dA  , 
0 

or  integrating 

(All. 28)    f(z)  =iL^2iP(zi^   ;Z  ^A[a-(]/b)1  ^a-1  ^^  ^ 

r(a)b'''     0 

Gradshteyn  and  Ryshik  (1965)  state  that 

(All. 29)     /   X     exp(-Mx)  dx  =   y    ^(Vjiju) 
0 

where  •Y(a,x)  is  the  incomplete  gamma  function.  Hence  if  we  use  (All.  29), 

the  density  of  z  could  be  rewritten  as 

(All. 30)     f(z)  =   -[a-d/b)]"""  Y"[a,-[a-(l/b)]z]. 

In  any  given  period  t,  vv^ith  a  posterior  distribution  on  A   which  is 
gamma  and  an  exponential  random  shock,  we  cannot  get  closed  forms 
for  Cbie  convolution  of  the  variables.  Furthermore  the  "closure  under 
sampling"  property  of  tlie  prior  is  lost  with  a  prior  of  the  form 
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(All. 30).  For  instance,  suppose  that  the  prior  distribution  of  the  mean 
of  an  exponential  process  in  any  given  period  t  is  of  the  form  (All. 30). 
Consider  the  case,  now,  in  which  new  information  comes  from  a  sample  of 
n  observations  from  the  exponential  data  generating  process.  The  posterior 
distribution  of  A  ,  determined  by  means  of  Bayes  theorem,  is  given  by 

(ATI. 31) 

-[a-(l/b)]"^  Y[a,-{a-(l/b)}A  J  ?  [exp(-L  Zx  )  ] 
f"(Ajx)  =  — __L_JL 1 X ^ 

r  -[a-(l/b)l"''  Y[a,-[u-(l/b)]  AJV  [exp(-L  Ex.)]d.v 
0  t   t        t    1     t 

or 


(All. 32) 

Y[a,-{(t-(l/b)  }Aj.]  a"  [exp(-Aj.  Zx^)  ] 

/"  Y[a,-{u-(l/b)}A J  a"  [exp(-A^  Zx  ) ]  dA. 
0  t    t        tic 

Gradshteyn  and  Ryshik  (1965)  state  that 

(All. 33) 

r   K^^-^)e-^^  ,[v,ux]  dx  =   «!il(ii.±^-    F  ,1,^  +  v,v  +  l;a/(a+e)}  . 

where  ^F  (•)  is  a  Gauss  hypergeometric  function  which  in  most  cases  is  in- 
determined.  Hence,  the  denominator  of  (All. 32)  cannot  be  determined  as 
a  closed  form.  Therefore,  we  cannot  find  a  posterior  distribution  of 
the  form  of  the  prior  distribution. 
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Finally  consider  the  case  where  samples  come  from  an  exponential 

process  [as  defined  in  (AII.3)];  the  prior  distribution  for  the  unknown 

parameter  is  gamma  [as  defined  in  (AII.4)];  an   exponential  random  shock 

is  imposed  on  the  unknown  parameter  [as  defined  in  (All. 21)]  and  the 

equation  that  describes  the  nonstationary  condition  of  the  mean  is  given 

by  A  , ^  =  A  e   , .  We  will  show  that  even  for  the  simplest  case,  where 
^      t+1     t  t+1  ^ 

the  scale  parameter  of  the  gamma  distribution  has  a  value  of  one,  we 
cannot  get  tractable  results. 

In  any  given  period  t,  assume  that  the  posterior  distribution 
of  A  is  given  by  (All. A)  and  that  the  distribution  of  e  is  given  by 
(Ail . 21) .  If  we  assimie  that  the  scale  parameter  has  a  value  of  one, 
the  distribution  of  the  product  of  the  posterior  distribution  of  A 
and  tlie  nonstationary  random  shock  e  ,,,  i.e.,  z=  A  e  .,,  is  given  by, 

3  t+1'         '         t   t+1         ^  ^ 

oo       a  —  1 

(All. 34)    f(z)  =   /    {A    a  exp [-A-(az/A) J/AF (a) }  dA  , 

0 
or 

(All. 35)    f(z)  =  -7^   /"  A^~^  exp[-A-(az/A)]  dA . 

lU)    Q 

We  could  simplify  (All. 35)  by  using  the  equality  (All. 19)  to  rewrite 
the  integral  in  the  equation.  Hence  the  prior  distribution  of  the  mean 
at  the  beginning  of  period  t+1  has  a  density  function 

(All.  36)   f(A_)  =   2a''"^^  '^^~^^ '\      .     [2^y~^/^  (a)  ]  ; 
t+1  a-1      t+1 

where  as  before  K  (•)  is  a  Bessel  function  of  imaginary  argument, 
that  is 
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(All. 37)    K  (2/1)  =   /"  [exp(-2v^  (cos  h)t)]  (cos  h)vt  dt. 

0 

For  the  same  nonstationary  model,  if  we  assume  that  the  scale 
parameter  of  the  gamma  distribution  and  the  parameter  of  the  exponential 
random  shock  are  equal,  say  to  c,  then  the  distribution  of  the  product 
of  A   and  e    is  given  by 

(All. 38) 

f(z)   =    /"  {A'*"Mexp[-cA-(cz/A)]}   c^"^^/Ar(a)}   dA, 


or 


a+1  „ 

(All. 39)    f(z)  =  ^   r     A^"^  exp[-cA-(cz/A)]  dA, 


or 

(All. 40)    f(z)  =  2c^"^^  z^''"^^/^  K    r2cv^/r(a)]. 

3.     X 

In  the  case  that  the  parameters  are  unrestricted,  the  dis- 
tribution of  the  product  of  the  random  variables  has  a  density 


(All. 41) 


f(z)  =   /"  (a"^   exp[-A/b]a  exp  [-az/A  ] /AT  (a)b^}  dA, 
0 


or 


(All. 42)    f(z)  =  -^—   r   A^-2  [-(A/b)  -  (az/A)]  dA . 


r(a)b^    0 
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or 

(AII.AJ)    f(z)  =  ZCazb)^"^"*"^^^^  K    (2/uzT) /F  (a)b^  . 

In  all  three  cases  discussed  before,  it  is  clear  that  the  procedure 
does  not  yield  tractable  results.  We  cannot  use  f(z),  i.e.,  f(A  ,  ■■ )  » 
as  the  prior  distribution  of  the  unknown  mean  at  the  beginning  of  time 
period  t+1 . 
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Al'PENDIX  Hi 

APPENDIX  TO  CHAPTER  FOUR 
Algorithm  to  Determine  Prediction  Intervals  for  Lognorma 1 
and  LogS tudent  Distributi on s 
A  Bayesian  prediction  interval  of  cover  y  is  defined  as  an 
interval  A  such  tliat 

(AlII.l)      F(A|y)  =   /  P(x|y)  dx  =  Y  . 

A 

In  general  such  a  prediction  Interval  is  not  unique.  One  particular 

interval  which  we  sliall  consider  is  defined  as  follows.  A  most  plausible 

Bayesian  prediction  interval  of  cover  Y  (also  called  highest  posterior 

density  [H.F.D.]  interval)  has  the  form 

(A1II.2)      A  =  [x:P(x|y)  >_   y], 

where  Y  is  determined  by  P(A|y)  =  y  .    If    the  prior  distributions  are  natural 
conjugate  to  the  process  then  the  predictive  distribution  for  lognormal 
processes  is  lognormal  when  p  is  unknown  and  o~  is  known  and  is  logStudent 
when  y  and  o      are  both  unknown.  The  construction  of  H.P.D.  intervals 
becomes  difficult  for  tliese  distributions  since  they  are  asymmetric.  In 
this  aijpendix  we  develop  an  algoritlim  to  compute  the  prediction  intervals 
for  these  distributions. 

If  the  predictive  distribution  is  lognormal  vjith  mean  m  and 
variance  a  In,    then  the  H.P.D.  interval  of  cover  y  is  of  the  form  (a,b) 
where  a  and  b  are  the  solutions  of 
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(AIII.3)       r      fj^(x|m,  a-/n)  dx  =   y. 
a 

and  among  all  the  solutions  they  have  the  H.P.D.  property.  To  determine 

the  values  of  a  and  b  we  developed  a  search  procedure.  If  the  predictive 

distribution  is  logStudent  then  the  H.P.D.  interval  of  cover  y  is  of  the 

form  (a,b)  where  a  and  b  are  the  solution  of 


(AIII.4) 


/    ^TQ^^I"*'  "'  ^'  '^^  '^^  "  Y  > 


such  that  the  H.P.D.  property  holds.  Suppose  that  the  predictive  distri- 
bution could  be  represented  as  in  Figure  AIlI.l 


f(x) 


Mode 


"igure  AIII.l   Predictive  Distribution 
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The  search  procedure  works  as  follows: 

(i)  in  the  first  iteration  lahe 1  the  value  of  the  density 
function  at  the  mode  value  A.  i.e.,  A=  f(Mode);  label  the  value  of  the 
density  function  at  the  origin  C,  i.e.,  C  =  f(0).  Select  an  arbitrary 
initial  point  a   (greater  than  the  mode)  and  find  another  point  a,  with 
equal  density.  (See  Figure  Mil.  2.)  The  value  of  the  density  function 
for  this  initial  value  will  be  between  points  A  and  C;  label  it  B,  i.e., 
B  =  f(a^)  =  f(a2). 


f(x) 

f (Mode) 

B=f(a^). 

\ 

=f(a2) 

\ 

C=f(0) 

^-^ 

ai   Mode     a,. 


Figure  AII1.2   Predictive  Distribution 


(li)  Determine  the  area  under  tlie  curve  between  a-i  and  a  , 


a)  If    /    f(x)  dx  <  Y  tlien  it  means  that  the  value  of 
""l 
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the  density  function  for  the  next  point  in  the  search  will  be  between  points 
B  and  C.  Relabel  those  points  as 

A  =  B 
and 

C  =  C; 


then  select  the  next  point  in  the  search.  (See  Figure  AIII.3.) 


f(x) 


a.    Mode 


Figure  A1II.3   Predictive  Distribution 

b)  If    /    f(x)  dx  >  Y  then  it  means  that  the  value 
^1 

of  the  density  function  corresponding  to  the  next  point  in  the  search 

will  be  between  points  A  and  B.  Relabel  those  points  as 


A  =  A 


and 
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C  =  B  ; 


hen  select  the  next  jxiint  in  the  search.  (See  Figure  AIII.4.) 


f(x) 


a-.      Mode 


Figure  AIII.4  Predictive  Distribution 

(iii)  To  select  the  new  points  for  a   and  a^,  in  either  cases 

K  ^       2' 

ii-a  or  ii-b,  take  a   to  be  the  solution  to  -the  following  equation 


(AIII.5)      f  (a,)  =  C  +  .681  (  A  -  C  ) 


See  lAu-iiberger  (1973)  for  a  discussion  of  the  use  of  the  golden 
section  method,  which  usl-s  the  constant  .681  . 
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and  then  lind  d,^  with  deusiLy  equal  to  a  ,  where  t  (•)  is  the  predictive 
density  functioci. 

(iv)  Once  we  find  a   and  a   (with  equal  density)  we  could 
go  to  step  (ii)  and  repeat  the  procedure.. 

The  algorithm  stops  if  it  does  not  find  the  desired  y  content 
intervals  within  a  specified  number  of  iterations  or  if  it  finds  the 
interval  for  a  specified  precision,  that  is  if  the  absolute  value  of 
the  difference  between  the  computed  y  content  and  the  required  y   content 
does  not  exceed  a  specified  precision.  A  computer  program  was  written  to 
determine  the  H.P.D.  Intervals  for  lognormal  and  logStudent  distributions 
using  the  previous  algorithm. 

The  computational  work  requires  the  use  of  some  numerical  algo- 
rithms. We  used  three  computer  packages  from  the  International  Mathematical 
and  Statistical  hibraries,  Inc.  Volume  2.  To  determine  the  mode  of  the  log- 
normal  and  logStudent  distributions  we  used  the  subroutine  ZXMIN,  which 
is  a  quasi-Newton  algorithm  for  finding  the  minimum  of  a  function  of  N 
variables.  To  integrate  the  functions  from  a   to  a^  we  used  DCADRE ,  which 
integrates  a  function  t(x)  from  a  to  b  using  cautious  adaptive  Romberg 
extrapolation.  To  determine  the  new  values  of  a,  and  a^,  say  a*  and  a^ , 
we  used  the  subroutine  ZREALl,  which  finds  real  zeros  of  a  real  function 
f(x)  where  the  initial  guesses  may  not  be  good.  In  Tables  1  and  2  we 
present  some  intervals  computed  for  some  lognormal  and  logStudent  distri- 
butions. 
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TABLE  2 

PREDICTIVE  INTERVALS  FOR  SOME  LOGSTUDENT 
PREDICTIVE  DISTRIBUTIONS 


Lower     Upper     Computed   Density 
PARAf-IETERS     Interval   Limit:     Limit:     interval:   at  the   number  of 


m  V  n  d  Y  a^.  ^9  '^*  Limits    iterations 

2  .4  15  10  .90  .2989  22.2919  .8999  .0083  14 

2  .5  14  10  .90  .4940  20.7564  .8999  .0094  13 

2  .5  15  9  .90  .4637  20.5035  .8999  .0097  12 

2  .5  15  10  .80  .97384  14.4370  .7999  .0237  12 

2  .5  15  10  .90  .5481  19.7881  .9001  .0107  10 

2  .5  15  10  .95  .2237  29.3595  .9499  .0034  16 

2  .5  15  11  .90  .5619  20.2267  .8999  .0101  11 

2  .5  16  10  ,90  .5334  20.3135  .8999  .0099  14 

2  .6  15  10  .90  .8561  18.4218  .8999  .0121  14 

3  .5  15  10  .90  1.4081  55.3166  .8999  .0036  13 
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The  algorithm  is  used  to  determine  highest  posterior  density  intervals 
but  can  be  used  to  determine  any  type  of  intervals  desired.  Minor  changes 
in  the  computer  program  are  needed  to  determine  one  sided  prediction 
intervals  or  any  other  interval  needed.  To  get  any  of  the  intervals  shown 
in  Tables  1  and  2,  the  user  needs  to  submit  only  the  parameters  of  the 
predictive  distribution,  the  desired  y   content  of  the  interval  and  the 
value  of  the  complete  Beta  function,  B{l/2,(d/2)}  .  The  computer  program 
then  gives  as  the  output  all  the  information  that  appears  in  Tables  1 
and  2.  For  instance,  when  the  shift  rate,  y,  and  the  spread  parameter, 
o^,  of  a  lognormal  predictive  distribution  are  1  and  .5  respectively 
and  a  .90  content  interval  is  desired,  the  algorithm  finds  a  .8999 
content  interval  with  limits  .3988  and  6.8162.  Similarly  when  the 
parameters  of  a  logStudent  predictive  distribution  are  (m=2,  v=.A, 
n=15,  d=10)  and  a  .90  content  interval  is  required  the  algorithm  finds 
a  .8999  content  interval  with  limits  .2989  and  22.2919.  Following  we 
present  a  computer  printout  of  the  program  to  find  predictive  intervals 
for  the  logStudent  distribution. 
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